Tech Report - Shrinjay

Analysis of Alternate Solutions

To analyze our solutions, we started by defining requirements for the computer vision system. These requirements were derived from the components of each task. We also prioritized these requirements as high priority, medium priority and low priority. High priority requirements are requirements we are sure we could do and had a direct impact on our ability to complete the task. Medium priority requirements are req

For Task 1:

Provide a video feed to the pilot to allow them to visually identify the package - High Priority

For Task 2:

Scan the QR code provided, send the coordinates to the autopilot, and send the questions to the ground station - High Priority
Provide a video feed to the pilot to allow them to identify and visually track the intruders - High Priority
Detect human bodies in images to help the pilot identify and visually track the intruders - Medium Priority
Provide coordinates of the intruders to the aircraft to allow it to autonomously position itself over the intruders - Low Priority
Automatically generate a trace of the intruders using data on the intruder’s position collected from object detection and geolocation - Low Priority.

To meet these requirements, the CV system needs to be able to perform the following operations:

Detect a human being in an image frame.
Find the position of a human being in world space from their position in image space.
Communicate with a ground station and autopilot chip.
Capture video, transmit to our ground station and to our onboard CV computer.
Provide an application to allow the ground crew to view a video feed and read questions from the QR scan.

Therefore, the alternate solutions involve alternate methods of implementing each of these operations.

As multiple operations are part of one task, our first decision is how to structure and combine operations. We can do this by either breaking the code into modules which encompass an operation that are then combined for a task, or combining all operations into one function which encompasses a task. We chose to break our code into modules as some operations, such as communication, are reused across tasks. Further, decoupling pieces of our software allows for better maintainability as failures can be isolated, faster development as work can be done in parallel, and faster expansion as new operations can be added without changing other modules. We had the choice to run these modules either on one process or multiple, and we chose to multi-thread our software as it improves performance by allowing different processors to handle different operations while using the resources we have available.

We will now consider alternate solutions for the implementation of each operation.

For operation #1, this task has to be automated using an object detection technique. As we can easily collect training data on human bodies, a neural-network approach was suitable as a classical method such as Viola-Jones would require manual identification of specific features of a human for training. There are a number of netural-network object detection methods, such as R-CNN, RetinaNet, YOLO, and many more. As of when we made our selection, YOLOv5 performed the best in benchmarks for object detection. Further, there are easily available versions of YOLOv5 that are open-source, allowing us to save the time to implement it manually.

For operation #2, we can broadly use visual methods, which uses a perspective transform built from image and aircraft pose data alone, or non-visual methods, which uses a different form of sensor to identify the position of the target or gather additional information to compute a perspective transform. Non-visaul methods could include using simple rangefinding with radar or LIDAR, or a more complex point cloud built from LIDAR. Non-visual methods would more directly allow us to determine position as we are directly collecting distance data, whereas with non-visual methods we determine distance from camera intrinsics and physical models of light. We chose to use a visual method as we only need to be able to identify the target’s position roughly for visual tracking. Therefore the additional accuracy of a non-visual method would not be worth the additional cost and engineering required in acquiring and using additional sensors.

For operation #3, we have to choose how we transmit the data and how we represent it. To transmit data to the ground, a wireless method must be used, all of which are ultimately some form of RF. Therefore, we simply chose to use XBee’s as our RF transmitters and receivers as we already had them in stock and could thus minimize cost. To transmit data to the autopilot chip, we use a wired communication protocol to maximize reliability and because a wireless method would give little benefit since the two chips are so close. While various protocols exist, including USB, UART, CAN, SPI, and others, we chose to use UART due to its simplicity and pre-existing expertise on the team. To represent data, we opted to design a custom encoding as off-the-shelf encodings such as MAVLink require extra development with no benefit for us given the simplicity of our communications.

For operation #4, we were restricted to using CSI for video transmission to the CV computer as it was supported by out NVidia Jetson. Therefore, we had to choose a camera that supported CSI, and the best known, best-supported camera is the Raspberry PiCam, which we chose to use. To transmit video to the ground, we opted for an off-the-shelf solution due to the complexity of encoding video data ourselves. We used OpenHD with our own hardware over a DJI Lightbridge due to the DJI being too expensive.

For operation #5, we could use a desktop or mobile application. We opted for a desktop application due to the ability to add our own hardware easily to a desktop. This desktop application could be built with a number of libraries, but we ultimately chose Qt as it was in C++, allowing our firmware team to easily contribute, integrated with serial communications and video input easily.

Project Management Plan

Milestones

Development of Milestones

To utilize milestones as indications that we are moving in the right direction, our milestones are tied directly to the competition requirements. The competition requirements we follow are detailed below.

Our aircraft must be able to safely take off and maintain flight.
Our aircraft should be able to safely land under manual control.
Our aircraft should be able to perform basic maneuvers to fly forwards, backwards and turn.
Our aircraft should be able to sustain flight long enough to complete both missions. Task 2 requires a flight of approximately 3.5km at most, assuming the aircraft does not have to return, so we will set the minimum distance as 4km.
Our aircraft should be able to provide the pilot with the ability to observe the space around the aircraft beyond visual line-of-sight.
Our aircraft should be able to maneuver with sufficient dexterity to get close enough to an object on the ground to retrieve it.
Our aircraft should have the ability to retrieve an object from the ground and fly with it loaded.
Our aircraft should be able to maneuver with sufficient dexterity to softly drop an object into a barrel.
Our aircraft should be able to scan a QR code, relay the location to the autopilot, and relay the questions described to the pilot.
Our aircraft should be able to fly to a given point autonomously once it is airborne.

Below, we compile our milestones with the corresponding requirements in brackets written as R1, R2, … corresponding to the numbered list above. Some requirements are broken into multiple milestones as they can be performed at various levels.

Milestone	Date

Milestone	Date
First aircraft takeoff and landing (R1 & R2).
First flight between two distinct points separated by at least 50m (R1).
First flight where the aircraft is able to maneuver forwards, backwards and turn about its vertical axis (R3).
First flight where the aircraft can fly between two points at least 4km apart (R4).
First time we can transmit a video feed from the aircraft and display it on our ground station (R5).
First time we are able to change the direction the aircraft camera is facing using movement of the gimbal and aircraft (R5).
First time we are able to pick up an object sized similarly to the device (R6).
First time we are able to fly between two distinct points >= 500m apart while carrying an object similar to the device (R7).
First time we are able to release an object being carried, similar to the device, such that it does not hit the ground with excess force (R8).
First time we are able to release an object similar to the device into a container sized similarly to the one at competition such that it does not hit the bucket with excess force or tip over (R8).
First time we are able to scan a QR code (R9).
First time we are able to transmit the location from the scanned QR code to the autopilot (R9).
First time we are able to display the questions from the scanned QR code on the ground station (R9).
First time our aircraft is able to autonomously fly between two points >= 500m apart provided the aircraft is airborne (R10).

Schedule

TBD, need leads

Budget

TBD, need wish list

Pickup/Dropoff: CV

To provide visibility for the pickup and dropoff operations, our drone is fitted with two Raspberry Pi Cams, with one on a camera pointing system. This camera is connected to an OpenHD video transmission system that transmits video to our ground station and allows the pilot to view the video feed on a monitor. This gimabl is connected to a gimbal controller, which recieves input from the ground station via the CV computer.