Page Comparison

Summary

Estimating the 3D hand-object poses from a single image file is very difficult and many datasets lack the features that can be used to determine these orientations. ArtiBoost is a project that aims to estimate various hand-object poses in multiple orientations through sampling in a Composited hand-object Configuration and View-point space (CVV-space).

Objective

The objective of the research is to efficient hand-object pose estimation (HOPE). A human hand has around 21 degrees of freedom (DoF); this makes it very difficult to estimate hand-object poses. ArtiBoost is an online data enhancement method that aims to boost the articulated hand-object pose estimation by exploration and synthesis.

Method

Firstly, a discrete 3D space called the CCV-space is designed with object types, hand poses, and viewpoints as components. At the exploration step, ArtiBoost explores the CCV-space and samples hand-object-viewpoint triplets from it. During synthesis, the hand and object in the triplet will be rendered on the image from the viewpoint in the triplet. This essentially produces a synthetic image which is then later mixed with real-world source images in batches to train the HOPE model. The losses are fed back to the exploration step.

...

Result

The results of an experimented conducted indicated that a model using 10% N of the real-world poses and 100% N of the synthetic poses had the highest accuracy in any case. Limitations include that a diversity of poses help train the model better than more realistic features of the hand-object pose inputs.

Inference

WARG’s video data could be pipelined into ArtiBoost to create a CCV-space which will help estimate hand-object poses. This could be useful for various applications including hand gesture recognition. However, this model may not work when the UAV is in flight far from the object. Furthermore, the paper does not mention the computational time or resources taken the employ the modelA real-time local 3D map building algorithm for navigation in autonomous land vehicles (ALVs) is theorized using binocular stereo vision. A model is constructed and the error of the algorithm is analyzed based on the 2D prediction error to find a depth factor which can be used to build a local 3D map. A global 3D map can be built by integrating the local 3D map with information from an INS and a GPS.

Objective

A binocular stereo vision system (one that uses more than two cameras) is utilized complemented by a 3D local map-building algorithm in combination with INS and GPS. This is done for autonomous land vehicles (ALVs) to get information about its surroundings.

Method

Two cameras with parallel optical axes form the setup of the stereo vision system. The image pairs generated by the cameras have a resolution high enough that the residual vertical disparity is under one pixel.

...

A matching algorithm is used to create a disparity map. This is done by finding something called the sum-squared-difference (SSD) minimums for each pixel independently and finding the minima of fitted parabolas. Disjoint regions or small regions of bad matches can be removed with a simple blob colouring algorithm. The matching algorithm is run on a CPU running at 2.4 GHz and its runtime is under 100 ms.

Now for building a 3D map around the ALV, a change in coordinate systems take place. The camera’s coordinate system (C-CS) is converted to the ALV coordinate system (ALV-CS) where some mathematical operations are performed with the 2D coordinates gathered from the image pairs. By this conversion, a depth factor is calculated which is used to build the local 3D map of the disparity map.

...

Experimental error is calculated using physical measurements and then added to the ALV-CS coordinates.

Result

The 3D data in closer regions is dense, while the data further away is sparse or even non-existant. Furthermore, any data occluded or not detected by the camera is inexistent. A global 3D map is made in a coordinate system called the world coordinate system (W-CS) by converting the ALV-CS to W-CS. The W-CS is built on INS and GPS technologies.

Inference

The algorithm used here was built for autonomous land vehicles and it may not be suitable for implementation in a UAV. However, it is worth looking into given that it performs well to build global 3D maps. Local 3D maps generally have less information on distant coordinates but on integration with an INS and a GPS, it could map the region around pretty well. However, the setup would require two cameras as well as a GPS and an INS built onto the UAV. Another good thing with this algorithm is that it is real-time which means that if we were to incorporate it into WARG’s architecture, the live video feed could be pipelined into it and the 3D map generated could be fed into a new module for navigation.

In a nutshell, this model has both pros and cons but is worth looking into for 3D map building and navigation.

References

https://openaccesswww.thecvfresearchgate.comnet/content/CVPR2022/papers/Yang_ArtiBoost_Boosting_Articulatedpublication/224643999_3D_Hand-ObjectMap_PoseBuilding_EstimationBased_viaon_Online_Exploration_CVPR_2022_paper.pdfhttps://github.com/lixiny/ArtiBoost Stereo_Vision
J. M. Saez and F. Escolano, ”A global 3D map-building approach using stereo vision,” in Proceedings of the 2004 IEEE International Conference on Robotics and Automation, pp.1197-1202.
M. W. M. G. Dissanayake, P. Newman., S. Clark, et al., ”A solution to the simultaneous localization and map building (SLAM) problem,” IEEE Transactions on Robotics and Automation, vol.17, no.3, Jun. 2001, pp.229-241
Y. L. Xiong and L .Matthes, ”Error analysis of a real-time stereo system,” in Proceedings of the 1997 IEEE Computer Society International Conference on Computer Vision and Pattern Recognition, pp.1087-1093.

Versions Compared

Old Version 1

New Version Current

Key

Summary

Objective

Method

Result

Inference

Objective

Method

Result

Inference

References