Model Training Repository

Overview

Training is run on the WARG desktop on the Windows partition. A default Ultralytics model is loaded (e.g. nano) with no initial weights. Training configurations are described here: https://docs.ultralytics.com/usage/cfg/

Repository:

GitHub - UWARG/model-training: ML model training for object detection

Software

Setup

Follow the instructions to clone the repository and activate the environment: Autonomy Workflow Software

Install packages:

pip install -r requirements.txt

Run initial training to create the configuration files:

python -m training

If it reaches the dataset checking phase, press ctrl-c to stop the program. It will probably fail before that point with a file not found error.

Open the Ultralytics configuration file:

  • Windows: C:/Users/[Username]/AppData/Roaming/Ultralytics/settings.yaml

  • MacOS: ~/Library/Application Support/Ultralytics/settings.yaml

  • Linux: ~/.config/Ultralytics/settings.yaml

Go to the directories of the 1st 3 lines and delete the directory there:

  • Example: runs_dir: C:\Users\WARG\model-training\runs , so go to model-training and delete runs

Change the 1st 3 lines to this:

datasets_dir: C:\Users\WARG\Ultralytics\datasets weights_dir: C:\Users\WARG\Ultralytics\weights runs_dir: C:\Users\WARG\Ultralytics\runs

Use other directories if desired.

On Windows, if there is an error along the lines of "path/to/fbgem.dll" or one of its dependencies is missing when importing pytorch, you can check if it’s missing. Usually it is not missing, but its dependency libomp that’s missing. You can install this as part of the Microsoft Visual C++ redistributable. If that still does not work, you can directly download it from here and put it in C:\Windows\System32.

Usage

Move or copy the 3 directories of the dataset so that it is in the dataset directory:

Make sure that any old datasets are out of this directory or have their test, train, val directories renamed (e.g. test_landing_pad , train-old , val0 )! Hiding them in a directory underneath dataset is not sufficient (e.g. ...\datasets\landing_pad\test might still be erroneously used).

Navigate into the repository and activate the environment: Autonomy Workflow Software

Run training:

Training will take a few hours.

If training is interrupted, change the model load path in training.py :

Where [latest training number] is the number of the checkpoint.

Hardware

Each epoch takes approximately 5 minutes to complete on an NVIDIA GeForce RTX 2060 with 6GB VRAM: https://www.techpowerup.com/gpu-specs/geforce-rtx-2060.c3310

  • There is only enough VRAM for nano and small models, not larger ones.

  • PyTorch version 2.4.1 is being used with CUDA 12.4, which has minimum driver requirements. Please ensure that a suitable driver is installed (there should be a high enough driver version for the RTX 2060, you can find them here if needed. (We are not going to use the old Jetson TX2i, but the RPi5 on the drone)

WARG desktop details: WARG Desktop

More detailed CUDA compatibility information: CUDA and PyTorch

Â