2025 Model Training: YOLO11n vs YOLO11s

2025 Model Training: YOLO11n vs YOLO11s

  • Inference latency: YOLO11n = YOLO8n <YOLO11s (3.8)

  • Accuracy: YOLO11s >YOLO11n > YOLO8n

  • Size: YOLO11n<YOLO8n<YOLO11s

  • Since we are working on a real-time system and running inferences on the airside where there are limited hardware resources, high latency can introduce framerate drops as the video input worker queue fills up, leading to decisions made based on outdated positions. 

  • Our safest bet is YOLO11n, something that has slightly higher accuracy and lower/similar latency (depending on the dataset) than YOLO8n, a model that was proven to be working last term. 

  • However, YOLO11s can be more noticeably accurate than YOLO11n but with 10-20% higher latency than YOLO8n. 

  • Might be worth it to try YOLO11s since accuracy was more of an issue than latency last term, though YOLO11n might already be slightly more accurate than YOLOv8.

  • This depends on latency requirements and how much the latency will change when running inferences on airside.

  • Model performance can be very dependent on datasets, as demonstrated in the research below that uses a dataset that is not COCO.

Validation with Roboflow dataset on 4070:

definitions at the end

Model

Latency (ms)

mAP 50-95

mAP 50

Precision

Recall

Model

Latency (ms)

mAP 50-95

mAP 50

Precision

Recall

YOLO8n

2.9

0.78

0.994

0.983

0.99

YOLO11n

2.9

0.809

0.995

0.991

0.992

YOLO11s

3.8

0.82

0.995

0.995

0.992

Graph

(double

click to enlarge)

YOLO8n

YOLO11n

YOLO11s

Graph

(double

click to enlarge)

YOLO8n

YOLO11n

YOLO11s

Confusion Matrix

confusion_matrix.png

 

confusion_matrix11n-20250702-191747.png

 

confusion_matrix-20250702-222634.png

 

Normalized Confusion matrix

confusion_matrix_normalized.png
confusion_matrix_normalized11n-20250702-191746.png

 

confusion_matrix_normalized-20250702-222634.png

 

Precision-Recall Curve

PR_curve.png
PR_curve-20250702-191746.png

 

PR_curve-20250702-222633.png

 

F1 Confidence Curve

F1_curve.png
F1_curve-20250702-191746.png

 

F1_curve-20250702-222633.png

 

Precision-Confidence Curve

P_curve.png
P_curve-20250702-191746.png

 

P_curve-20250702-222634.png

 

Recall- Confidence Curve

R_curve.png
R_curve-20250702-191746.png

 

R_curve-20250702-222634.png

 

Ultralytics Performance Metrics (COCO Dataset)  

https://docs.ultralytics.com/models/yolo11/#performance-metrics https://docs.ultralytics.com/models/yolov8/#performance-metrics

螢幕擷取畫面 2025-07-01 132926.png
螢幕擷取畫面 2025-07-01 133207.png

Roboflow leaderboard (COCO 2017):

 Computer Vision Model Leaderboard   

image-20250701-174829.png
image-20250701-175222.png

 

Other research: Evaluating the Evolution of YOLO models  

image-20250701-175616.png
image-20250701-175636.png

 

image-20250701-175708.png
image-20250701-175737.png

Definitions:

  • MAP val: Mean Average Precision calculated during the validation phase of model training (area of precision(accuracy)–recall(if it detects every instance) curve  

  • MAP50: mean average precision calculated at an Intersection over Union (IoU) threshold of 0.5. This means a detection is considered correct if the overlap between the predicted bounding box and the ground truth bounding box is at least 50%. 

  • MAP50-95: average mAP values across IoU thresholds of 50% to 95% 

  • Speed ONNX (ms) = runtime using Open Neural Network Exchange (open-source format for representing machine learning and deep learning models) 

  • Speed T4 TensorRT10 (ms) = runtime using NvidiaGPU  

  • FLOPs: floating point calculations (represents complexity) 

  • F1 score: harmonic mean of precision (accuracy of positive predictions) and recall (how well a model identifies all relevant instances) 

System Requirements: https://docs.ultralytics.com/help/FAQ/#what-are-the-system-requirements-for-running-ultralytics-models

For Raspberry Pi: https://docs.ultralytics.com/guides/raspberry-pi/

image-20250701-175922.png