Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Summary

The research paper discusses a stereo vision based mapping algorithm for modelling hazards in an urban environment for the case of a mobile robot. It consists of generating a depth of disparity map from the stereo images; the depth range coordinates calculated are then passed into a 3D grid generation algorithm to analyze the environment. This 3D grid is then segmented into planes which are checked against the plane of the robot to see if it is accessible or not. With this information a 2D local safety map is created which is used for the navigation of the robot.

Objective

A wheeled mobile robot requires to detect obstacles, drop-offs, and inclines for navigation. This objective of the research is to present a stereo vision based mapping algorithm using cameras alone for detecting inclines, drop-offs, and static obstacles. The stereo mapping algorithm would identify regions that are safe and unsafe in the robot’s local surroundings. This way, the mobile robot could safely navigate its surroundings.

Method

An algorithm known as the 6-DOF SLAM is employed for providing the robot’s pose to the mapping algorithm. Essentially, this is used for creating a disparity map from the image pair returned by the stereo camera setup. While post-processing, the range readings that are significantly different from neighbouring range readings are removed due to their likelihood of being incorrect. This localization step essentially helps figure the robot’s pose in 3D space and returns a set of 3D range points in the globe coordinate frame.

...

The next step involves updating a 3D model consisting of a 3D grid and 3D point cloud with the new 3D range point readings using an occupancy algorithm. Essentially, for each 3D range point, a ray is cast from the camera to the 3D point and voxels along the ray have their log odds probability (LOP) of occupancy updated. A voxel starts off with a LOP of zero and is either incremented or decremented based on the change in the surroundings of the robot. Each voxel’s list is ordered according to the distance of the camera from the point and the voxels that have a high probability of occupancy are identified in the 3D occupancy grid.

...

In this step, the 3D model is segmented into traversable ground regions and then planes are fitted to points in accordance with the segments using the linear least squares algorithm.

...

Next, a local safety map is built. We assume that the segment in which the robot is located is safe. Based on the 3D grid, all regions reachable by the robot from its original segment is labelled as Safe. All other regions are labelled as Non-ground. This is done by labelling individual segments as Level, Inclined, or Unknown. The boundary cells of the segments labelled as Level or Inclined can be checked with respect to Unknown boundary cells to know if there's a drop-off edge. Furthermore, some mathematical parameters are used to calculated whether a certain incline is navigable by the robot. The output from this step is a 2D local safety map with various annotations for safety.

Result

The error rate for the stereo safety map versus ground truth safety map was analyzed. The FN rates are very less, however, the FP rate is quite high which means that the robot might detect an object when there is none.

...

Furthermore, it takes a while and several frames for the algorithm to detect objects beyond 4 to 5 meters. This means that it is unreliable for detecting objects beyond a certain radius.

Inference

This research paper provides a great idea of how we can go about implementing a stereo vision based model in various steps. While the algorithm is not best suited for detecting objects beyond 4-5 metres and while airborne, it gives a good example of how we should be thinking about proceeding with 3D vision mapping algorithm development. As it uses a stereo camera based system, it is very easily implementable on the WARG UAV. A few of the research’s steps could be used in addition to other mapping steps to generate a suitable a final mapping algorithm.

References

...

https://web.eecs.umich.edu/~kuipers/papers/Murarka-iros-09.pdf

...

A. Murarka, M. Sridharan, and B. Kuipers, “Detecting obstacles and drop-offs using stereo and motion cues for safe local motion,” in IROS, 2008.

...

N. Heckman, J. Lalonde, N. Vandapel, and M. Hebert, “Potential negative obstacle detection by occlusion labeling,” in IROS, 2007.

...

J.-S. Gutmann, M. Fukuchi, and M. Fujita, “3D perception and environment map generation for humanoid robot navigation,” Intl. Journal of Robotics Research, 2008.

...

A. Rankin, A. Huertas, and L. Matthies, “Evaluation of stereo vision obstacle detection algorithms for off-road autonomous navigation,” in 32nd AUVSI Symposium on Unmanned Systems, 2005.

...

This study proposes a method to create semantic 3D maps by combining a 3D LiDAR and a camera. This is complemented with semantic mapping and map refining. A GPS and IMU are used for localization and certain error reduction methods are used to create a semantic 3D map with less errors. The results of this study can be used for drone navigation and surveying.

Objective

Semantic 3D maps are required for multiple purposes including UAV navigation. However, camera based approaches are inefficient in large scale environments due to both the computational power required as well as scarce information extracted. The goal of this research is to combine a 3D LiDAR with a camera for semantic mapping.

Semantic 3D mapping involves reconstructing a UAV’s environment into 3D space and embedding semantic information into a map. A semantic map as opposed to a geometric map would also contain certain additional information to help a robot better understand its surroundings and perform high level operations.

...

3D semantic mapping involves segmentation where a point cloud is taken as input and semantic labels are assigned to each point. In the case of 2D semantic segmentation, semantic labels are instead assigned to each pixel. In this study, RefineNet, an open source 2D semantic segmentation tool was used.

A simultaneous localization and mapping (SLAM) algorithm will be used to transfer 2D semantic information into 3D grids which then creates a semantic 3D map. This study only has the seven labels for classification: road, sidewalk, building, fence, pole, vegetation, and vehicle.

A GPS and IMU are used to estimate the odometer of the system.

Method

The semantic 3D mapping process is generated in the following manner:

...

The research involves semantic mapping followed by post-processing map refinement. By receiving odometry data from the GPS and IMU, a global 3D is generated.

Camera-based 2D segmentation is performed to a generate a 3D semantic map. This is quicker than 3D semantic segmentation and is also more performant.

...

For semantic mapping, coordinate alignment is performed where the 3D LiDAR’s data is used to map each voxel to each pixel. To minimize errors, a clustering method and random forest classifier are used. Then a probability distribution algorithm is used to assign the semantic labels for each voxel.

An algorithm is used to figure out noise voxels, that is outlier voxels that could be wrongly segmented. This is then rectified with the formerly generated semantic map to produce a final semantic map with lesser errors.

Result

The KITTI dataset was used to perform experiments. In qualitative evaluation, it was noted that some parts of the map which were not trained in the 2D semantic segmentation had been assigned labels that most closely resembled one of the pre-trained labels.

Inference

This is a very good study for WARG’s specific use case. On one note, it is used to map urban environments. However, it does a pretty good job of estimating and semantically classifying the environment. For WARG’s use case, it could start with something basic like differentiating the ground from some other inclined or elevated surface, and it could pick up from there. A disadvantage is this approach requires a GPS and IMU which specifically includes a filtering-based state estimation process which has centimetre-level granularity for the 3D mapping process. However, it does have a lot of information and resources that could be carried over to the WARG UAV.

References

  1. https://doi.org/10.3390%2Fs18082571

  2. https://www.ri.cmu.edu/pub_files/2014/7/Ji_LidarMapping_RSS2014_v8.pdf

  3. https://doi.org/10.1109/CVPR.2012.6248007