/
Dataset Creation and Preparation

Dataset Creation and Preparation

Data Collection Information

Data is usually collected using in-flight imagery. See the specifications below.

Software: See Image Collection Repository

Hardware:

  • Flight computer: a Raspberry Pi with a 32GB microSD card is sufficient

  • Camera: The same one as used at competition (e.g. $200 CV Camera)

Flight: Find the drone height in the appropriate AEAC document: SysInt

Data Cleaning

Once images have been collected, any image with features of interest (e.g. landing pads) are kept, and the rest are discarded. This can be parallelized by uploading images to a folder on OneDrive and creating a sign-up sheet so that members process a few hundred images at a time:

  • Members can use the Tiles view to see the images without having to click through each one individually.

  • Images can be deleted in the OneDrive webpage.

Instructions from 2023/2024 can be found at 2023/24 Landing Pad Data Cleaning Instructions for reference.

Data Labelling and Roboflow Upload

Once images have been cleaned, they are downloaded and then uploaded to Roboflow for labelling. A Roboflow account with a non-existent email can be created (e.g. warg-0a@warg.warg).

Each Roboflow project can only have up to 3 collaborators, so multiple projects can be created. It is unclear whether simultaneous logins cause accounts to be banned, so it is better to create new accounts for every project and to add a sign-up sheet for members currently logged into the account.

Note: Since we would like to have versions of the dataset in varying sizes, it is easier to do so when initially uploading the entire dataset. Repeat the below process multiple times for different dataset sizes, uploading only varying proportions into each project (recommended: create small/medium/large dataset sizes, that are approximately 10%/30%/60% of the total dataset size respectively).

Ongoing projects should have existing Roboflow accounts and projects (made by the process below).

Creating a new Roboflow Project:

  1. Create a new Roboflow account (e.g. warg-0a@warg.warg): link

  2. In the main account, create a new workspace (e.g. warglandingpad)

  3. In the workspace, create a new project (e.g. warglandingpad)

    a) Typically, the use case requires Object Detection.

    b) Add the first class; additional classes are added in further instructions.

  4. Log out of main account.

  5. Create 2 additional Roboflow accounts (e.g. warg-0b@warg.wargwarg-0c@warg.warg)

  6. Log out of additional accounts; log into main account.

  7. In the main account, navigate to the workspace (left bar)

  8. Invite the additional Roboflow accounts (top right)

  9. Log out of main account; log into additional accounts.

  10. In the additional Roboflow accounts, accept the invitation (bell icon in top right)

  11. Log out of additional accounts; log into main account.

  12. In the main account, navigate to the workspace (left bar)

  13. Navigate to the project (bottom center)

  14. Uploading the dataset:

    a) Upload the images (left bar)

    b) The images will cache locally in the browser (i.e. RAM), so split the images (use a good batch name)

    c) Save and Continue

    d) Repeat until the images for the project are uploaded

  15. Classes:

    a) Navigate to Annotate (left bar)

    b) Assign a batch to the main account

    c) Click on an image

    d) Use the bounding box tool (right bar) to draw a box anywhere

    e) Add the new class

    f) Repeat d-e until all classes have been added

    g) Click on the project name (top left)

    h) Click on the 3 dots (left bar) and navigate to Project Settings

    i) Check Lock Annotation Classes

    j) Return to the image that was just labelled

    k) Delete the boxes

Repeat for as many projects as required. Try to make a variety of sizes for each dataset uploaded.

When a batch is labelled, verify the labels are correctly placed by sampling a few images (scroll down and randomly pick an image, repeat).

Once the batch is verified, add that batch to the dataset with the default 70/20/10 split.