Model Training Repository
Overview
Training is run on the WARG desktop on the Windows partition. A default Ultralytics model is loaded (e.g. nano) with no initial weights. Training configurations are described here: Configuration
Repository:
GitHub - UWARG/model-training: ML model training for object detection
Software
Setup
Follow the instructions to clone the repository and activate the environment: Autonomy Workflow Software
Install packages:
pip install -r requirements.txt
Run initial training to create the configuration files:
python -m training
If it reaches the dataset checking phase, press ctrl-c to stop the program. It will probably fail before that point with a file not found error.
Open the Ultralytics configuration file:
Windows:
C:/Users/[Username]/AppData/Roaming/Ultralytics/settings.yaml
MacOS:
~/Library/Application Support/Ultralytics/settings.yaml
Linux:
~/.config/Ultralytics/settings.yaml
Go to the directories of the 1st 3 lines and delete the directory there:
Example:
runs_dir: C:\Users\WARG\model-training\runs
, so go tomodel-training
and deleteruns
Change the 1st 3 lines to this:
datasets_dir: C:\Users\WARG\Ultralytics\datasets
weights_dir: C:\Users\WARG\Ultralytics\weights
runs_dir: C:\Users\WARG\Ultralytics\runs
Use other directories if desired.
On Windows, if there is an error along the lines of "path/to/fbgem.dll" or one of its dependencies is missing
when importing pytorch
, you can check if it’s missing. Usually it is not missing, but its dependency libomp
that’s missing. You can install this as part of the Microsoft Visual C++ redistributable. If that still does not work, you can directly download it from here and put it in C:\Windows\System32
.
Usage
Move or copy the 3 directories of the dataset so that it is in the dataset directory:
Make sure that any old datasets are out of this directory or have their test, train, val directories renamed (e.g. test_landing_pad
, train-old
, val0
)! Hiding them in a directory underneath dataset is not sufficient (e.g. ...\datasets\landing_pad\test
might still be erroneously used).
Navigate into the repository and activate the environment: Autonomy Workflow Software
Run training:
Training will take a few hours.
If training is interrupted, change the model load path in training.py
:
Where [latest training number]
is the number of the checkpoint.
Hardware
Each epoch takes approximately 5 minutes to complete on an NVIDIA GeForce RTX 2060 with 6GB VRAM: https://www.techpowerup.com/gpu-specs/geforce-rtx-2060.c3310
There is only enough VRAM for nano and small models, not larger ones.
PyTorch version 2.4.1 is being used with CUDA 12.4, which has minimum driver requirements. Please ensure that a suitable driver is installed (there should be a high enough driver version for the RTX 2060, you can find them here if needed. (We are not going to use the old Jetson TX2i, but the RPi5 on the drone)
WARG desktop details: WARG Desktop
More detailed CUDA compatibility information: CUDA and PyTorch
Â