Tracking: Research Brief

Overview

Problem: We want to be able to track the movement of a single intruder over multiple frames.

Algorithm Classifications

Detector: Detector algorithms perform repeated detections and map objects between the two detections to track objects.
- Detector algorithms work well with objects with distincitive features relative to the background, and likewise, fail when objects are similar.
- Detector algorithms, especially online algorithms (that learn at runtime) are slower.
Pure Tracking: Algorithms that use motion to estimate the future position of an object given its current position.
- Faster than detectors
- Less accurate when camera has motion
- Can show overall less accuracy depending on training context.
Tracking Framework: More general tracking systems that can work in multiple contexts, often combine the above two methods.
- Can be quite slow since they require a tracking and detection step.
- Show quite robust performance when occlusion occurs, but if in a situation where tracking is ineffective, it basically behaves like a slower tracking algorithm.
- Can result in high false positive rate

Algorithm Candidates

TLD, Tracking Framework Algorithm. Due to poor performance, no further research was conducted.
- Precision is among the lowest (1)
- Works poorly with multiple targets.
- Slow.
GOTURN: Technically a tracking framework algorithm, uses deep learning. See (2) for overview.
- Training Process: Given a frame, supply a base frame and a shifted frame with a reference bounding box. The algorithm learns to find the target bounding box given the shift.
- Deployment: Tracker only needs to be initialized with an initial bounding box.
- The model architecture takes an image and identifies a search region based on a smooth movement assumption. Based on the previous frame, it identifies the target region in the current frame. The previous frame and current frame are passed through the first 5 layers in parallel before being combined for the last 5 layers (2).
- GOTURN shows very strong performance in UAVs due to it’s flexibility and ability to deal with movement (3)
KCF: A pure tracking algorithm that is essentially regression trained by a minimization between all cyclic image shifts and the desired response using a unique mathematical property of diagonal matrices (4)(5).
- Training Process: Covered in (4). Consider a regression algorithm being trained to find the minimization between the output and the expected response. In this case, the output is a respective cyclic shift of the input frame. A cyclic shift is described best in (5). The key is that this regression can be performed very quickly due to a mathematical property of the cyclic shift, also best described in (5).
- Deployment: Tracker only needs initial bounding box.
- KCF shows lower performance in UAVs (3), likely because it is pure tracking so it assumes a given amount of movement without any detection step, therefore it is difficult to train correctly

References

Test Procedure

KCF

Using a training video dataset that we have (Stanford Drone or others)

Manually determine the initial bounding box

Setup test script to intialize tracker with manual bounding box then pass in each frame and view the results. Use this tutorial as reference.

Manually determine results

DeepSORT

Only proceed if KCF results are questionable and we’ve trained our detection model for people.

Integrate this deep_sort code with this pre-built YOLOv5 model after trained with our training data.

Follow test procedure as above, more detail TBD

Autonomy

Tracking: Research Brief

Analytics

Overview

Algorithm Candidates

References

Related content