3D Vision Research

The 3D vision scientific papers and articles researched are then analyzed to find how they fit in with the current computer vision architecture. The following factors are looked into and described for each study:

Problem Being Solved	What is the focus of the research paper or article? What is it trying to solve and why that is a good issue to take up?
Thought Process	How did the researchers brainstorm their solution? What were the factors taken into consideration and how did they approach the problem?
Actions Taken	What specific steps were taken to resolve the problem? How were these actions performed and why specifically these actions?
Analysis and Takeaways	What were some issues that were faced and how were they resolved? What compromises were made and how did that affect the rest of the project?
Summary and Resources	What resources can help with the research and what are the specific tools that can be used to implement this project? What are some references or documentation that can be looked into?

Goals of the 3D Vision Research Team:

Understand the current computer vision architecture thoroughly and have a high level understanding of how it integrates with the rest of the project including the firmware and electrical side. Research multiple studies on 3D vision and find key use-cases as well as benefits of incorporating this into the current computer vision architecture. Discover how and where 3D vision would fit seamlessly into the current architecture and document as much possible on the study as well as on resources that could be useful for implementing this model.

Key Requirements:

Must be fully functional while the UAV is up to ~50 meters in the air
Ideally doesn’t require additional hardware/components to function efficiently
If external hardware such as sensors are required, the optimal placement of these components must be analyzed
Can be easily incorporated into the current architecture of the UAV
The positioning of the 3D vision model within the architecture diagram must be analyzed
The power and computational resources required for utilizing the model must be specified

Comparison of 3D Vision Models:

The researched models are analyzed for their merits, demerits, and their key takeaways are listed. This gives insight into which model is best suited for the UAV and employing into the existing computer vision architecture.

	Research Paper	Problem Being Solved	Thought Process	Actions Taken	Analysis and Takeaways	Summary and Resources

	Research Paper	Problem Being Solved	Thought Process	Actions Taken	Analysis and Takeaways	Summary and Resources
1	ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via Online Exploration and SynthesisBoosting Articulated 3D Hand-Object Pose Estimation via Online Exploration and Synthesis	Hand-object pose estimation (HOPE) is extremely difficult given the different orientations and the dexterity of the human hand. ArtiBoost attempts to solve this issue.	It is an online data enhancement method that creates a CVV-space to create synthetic hand-object poses by exploration and synthesis. This is then fed into the model along with real data.	Complex statistics is involved in the creation of the CVV-space. However, the general idea is to train the model and feed the losses back to the exploration step.	The model is better performing than a dataset of only real-world hand-object poses. These synthetic hand-object poses tend to train the model better when they are more diverse rather than when in better quality.	https://openaccess.thecvf.com/content/CVPR2022/papers/Yang_ArtiBoost_Boosting_Articulated_3D_Hand-Object_Pose_Estimation_via_Online_Exploration_CVPR_2022_paper.pdf
2	3D Map Building Based on Stereo Vision	Building a local and global 3D map for navigation in autonomous land vehicles (ALVs).	Employ a binocular stereo vision system to use parallax from two cameras to calculate depth.	Used a matching algorithm to generate a disparity map which was changed into a new coordinate system for 3D map building.	The real-time global 3D map generated could be useful for mapping and navigation. However, it requires two cameras as well as GPS/INS built into the UAV.	https://www.researchgate.net/publication/224643999_3D_Map_Building_Based_on_Stereo_Vision
3	Stereoscopic First Person View System for Drone Navigation	Providing ground operators a system to control the drone with a more immersive stereo vision experience.	The ground operators can control the UAV using a controller and VR headset. The drone utilizes a stereo vision based camera with low latency to stream real time video onto the VR headset.	A stereo vision camera was planted on the UAV and an Oculus rift with a controller was utilized by the controller. Specific hardware was used for processing the video feed and sharing it to the ground operator.	The research is very in-depth and can be really useful to implement on the WARG UAV. Depth estimation can be performed for data gathering or live video could be streamed into an FPV based control system.	https://doi.org/10.3389/frobt.2017.00011
4	A Stereo Vision Based Mapping Algorithm for Detecting Inclines, Drop-offs, and Obstacles for Safe Local Navigation	Presenting a stereo vision mapping algorithm to find safe regions for navigation by detecting objects, inclines, and drop-off points.	A localized map must be generated with annotations to describe the surroundings of the robot. Essentially, the safe and unsafe areas must be analyzed so that the robot can navigate through 3D space.	Stereo vision is employed to calculate depth. The depth is then used to generate a 3D grid which is then segmented into levels and inclines. A 2D local safety map is generated to navigate the robot’s surroundings.	The research provides a really good insight into the steps that we must be taking while constructing our own 3D vision mapping model. Using only a camera for mapping, the idea is easily transferable onto the WARG UAV.	https://web.eecs.umich.edu/~kuipers/papers/Murarka-iros-09.pdf
5	Towards a Meaningful 3D Map Using a 3D Lidar and a Camera	Creating a semantic 3D map of an urban environment with a 3D LiDAR and a camera for robot navigation.	A 2D map with pixels would be segmented with respect to each voxel which differs based on the LiDAR data.	The 2D map created was segmented and certain error correction methods were performed to finally generate a labelled and fairly accurate semantic 3D map of the environment.	This research is really useful for the WARG UAV as it does a good job of labelling different parts of a 2D image from the camera. Combined with the LiDAR, it could generate useful information for mapping.	https://doi.org/10.3390%2Fs18082571
6	Map Construction Based on LiDAR Vision Inertial Multi-Sensor Fusion	Creating a high precision global 3D map using a fusion of SLAM with visual images as well as LiDAR and odometer data.	Data from the live camera feed will be infused with the data from the LiDAR point clouds and IMU to create a 3D global map.	After gathering all necessary data values, outlier points were removed to collect candidate points which were then optimized using a factor graph optimization process.	This research is highly mathematical with a lot of resources provided for easy incorporation onto the WARG UAV. It is highly precise and has great performance.	https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwi-sdKmsbD5AhV8hIkEHe8YB0AQFnoECCQQAQ&url=https%3A%2F%2Fwww.mdpi.com%2F2032-6653%2F12%2F4%2F261%2Fpdf&usg=AOvVaw0_s3n1-FI2kHputUXWepTi