2022 Computer Vision Architecture Overview

Goals

PS: The goals stated here are for the 2021-2022 season. The competition for this season has concluded, however, we will retain this architecture and the future competition goal may be similar to this one.

If you’re unfamiliar with the competition we’re participating in, here ’s a summary of the competition CONOPS (Concept of Operations, basically the rules/goals).

With respect to the CONOPS, the Computer Vision subteam at WARG was responsible for completing a few main tasks:

  • Identifying humans from the air from within a video stream

  • Identifying a 'package' from the air from within a video stream

  • Geolocating

  • Scanning QR codes (this is for scanning the 'package')

  • Streaming and displaying geolocation data on ground station

To do so, the CV system is segmented into a list of modules, each of which assists in accomplishing one of the above tasks.


The Modules

it may be useful to follow along exploring our CV repository as you read this guide (https://github.com/UWARG/computer-vision-python ).

DeckLinkSRC

Video data for the CV system originates from a GoPro mounted on the plane. The GoPro then uses a Blacksheep video transmitter to send live-stream data to the ground. The receiver is then attached to a video card on the CV computer, known as the DeckLink, which interprets and processes video input. The DeckLinkSRC program was designed to extract video data from this card and attach it to the CV program’s data pipelines.

However, this implementation has since changed, and the module now simply reads from an OBS stream which pulls data directly from the DeckLink. This module is also responsible for handling the life cycle of the video stream, including saving it for future training and closing the stream when prompted.

Command

The command module serves as the communication interface between the firmware and CV systems. It includes two major components: the POGI (Plane Out, Ground In) system, responsible for transferring telemetry data from the Xbee (firmware RF receiver) to the CV system, and the PIGO (Plane In, Ground Out) system, responsible for transferring CV data about pylon or box locations to the firmware autopilot. Data is transferred via two shared JSON files that are used by both the CV and firmware systems.

Target Acquisition

This module hosts and runs the YOLOV5 object detection neural network which detects humans within a video frame. Object detection involves creating a so-called “bounding box” within an image that identifies the rough location of a target object within an image. The bounding box includes data about its height, width, and x-y coordinates of its top-right corner. Note that the coordinate system of an image is such that the origin is at the top left, and the bottom right is defined by (+IMAGE_WIDTH, +IMAGE_HEIGHT). For 2022, we’ve additionally implemented deepSORT, a tracking algorithim that utilizes YOLOv5 to track humans within a given video.

Geolocation

The geolocation module implements a complex mathematical algorithm designed to correlate data about bounding boxes detected within an image to actual coordinates on the ground, based on telemetry data such as altitude, GPS coordinates of the plane, and camera gimbal. The algorithm itself is a bit too involved to cover here. You may refer to the following link if you’re interested in learning more about the math behind it: https://docs.google.com/document/d/1VKYIrmWJYfpLPCAjQPiH1J5fnQPtNcjEC2NE5Eg4QxU/edit .

QR Scanner

The QR scanner module uses OpenCV and the PyZbar library to detect QR codes within a video frame, decode those QRs, and display the decoded information using a bounding box for the pilot to see.

Timestamp

The timestamp module simply attaches time stamps to pieces of data as they are queued into the data pipelines (more on that below), to help with tasks such as matching image data with telemetry data.

MergeImageWithTelemetry

Merges image pipeline with telemetry pipeline by finding telemetry data with the closest timestamp to the image data(which also has a timestamp). The module then associates them and outputs a MergedData object.

 

You will probably notice other modules with the computer-vision-python repo. These modules e.g searchExplosive were used in either the 2021 or 2022 USC comp and are not currently relevant to the summer 2022 program


Main Program & Multiprocessing

 

The CV architecture is very object-oriented. The modules that are discussed above are all instantiated as objects within the main program (located in the root directory).

At any given time, there are several different tasks the CV system is responsible for. During flight for instance, the program needs to handle retrieving frames from the video stream, then calling a detection model on them, then running the geolocation module in the case that a pylon is found - all while using the command module to get information on telemetry and communicate with the autopilot. All of this is simply not achievable with a sequential program.

Instead, parallelism (running multiple programs/processes at once) becomes important. This is enabled by a concept and python library called multiprocessing. Multiprocessing takes advantage of multiple CPU cores to perform several operations simultaneously (or, in technical terms, concurrently). In the CV system, most modules have their own “process”.

With several operations happening concurrently, passing data is much different from the regular sequential implementations you may be used to. Instead, the CV architecture includes so-called “pipelines” of data - FIFO (First In, First Out) data structures called queues, which handle streams of data. One pipeline may hold POGI information, while another may include video frame data.

These pipelines are fed into and out of so-called “worker functions” that exist for each module within the CV architecture. These functions simply indicate if/what data the specific module takes in, and if/what data the module outputs. Worker functions that only take in data are called consumers, workers that emit data are called producers, and workers that do both are called producer-consumers.

In this manner, producer-consumers, pipelines, and processes compose the main concepts of the CV multiprocessing system.


Approach to Testing

Unit tests are included in each of the modules, and are created with the pytest library. The integration testing strategy is covered here: . The unit testing strategy is covered here