...
Computer Vision Architecture: Object detection & tracking
Autonomous intruders and explosives identification require object detection and classification - commonly used neural networks YOLO and R-CNN series are typically used for this purpose. Taking into consideration of model performance and implementation, Yolov5 was employed. It is chosen because the average mAP from 0.5 to 0.95 was 37.2 and the average detection time on CPU was 98ms. To ensure accuracy, over 1000 images will be used to train the neural networks to identify both intruders and explosives for training. Once trained, the neural network is expected to provide both bounding box and centroid coordinates, both important for identifing where the intruder/explosive is in latitude and longitude and navigating the drone towards the explosive and the intruder.
Note: mAP stands for mean Average Precision, it indicates the performance of the model.
Computer Vision Architecture: Processing and Interfacing:
WARG's CV comes in the form of a Python program. The processing system is composed of subsystems instantiated as classes in a singleton pattern within a mediator class, which acts as the main entry point of the computer vision system. The subsystems include the video source module, which grabs video frames from the video source, a processing module, that runs the appropriate neural network for each task, and location modules that perform the required processes to turn data from the neural network into GPS coordinates or relative locations. Thus, the system output is an API that provides this location data to to Zeropilot's telemetry manager.
Autonomous Identification of humans:
For Autonomous autonomous identification of humans, there are three major components working together to make the entire system work: Intruder detection, intruder tracking and tracking-based driving.
The Intruder detection component is based on a deep learning approach. We will collect custom data on the person(intruders) walking/running from an aerial view and train our own Yolov5 model based on the pretrained pre-trained weights provided by ultralytics using MS COCO dataset. With this component, our drone will have the capability to detect in real-time the objects in the camera’s field of view on the drone. This component will take in the video stream from the camera and return the bounding box of the intruder in the image if he is in the field of view.
...