Towards a Meaningful 3D Map Using a 3D Lidar and a Camera

Summary

This study proposes a method to create semantic 3D maps by combining a 3D LiDAR and a camera. This is complemented with semantic mapping and map refining. A GPS and IMU are used for localization and certain error reduction methods are used to create a semantic 3D map with less errors. The results of this study can be used for drone navigation and surveying.

Objective

Semantic 3D maps are required for multiple purposes including UAV navigation. However, camera based approaches are inefficient in large scale environments due to both the computational power required as well as scarce information extracted. The goal of this research is to combine a 3D LiDAR with a camera for semantic mapping.

Semantic 3D mapping involves reconstructing a UAV’s environment into 3D space and embedding semantic information into a map. A semantic map as opposed to a geometric map would also contain certain additional information to help a robot better understand its surroundings and perform high level operations.

3D semantic mapping involves segmentation where a point cloud is taken as input and semantic labels are assigned to each point. In the case of 2D semantic segmentation, semantic labels are instead assigned to each pixel. In this study, RefineNet, an open source 2D semantic segmentation tool was used.

A simultaneous localization and mapping (SLAM) algorithm will be used to transfer 2D semantic information into 3D grids which then creates a semantic 3D map. This study only has the seven labels for classification: road, sidewalk, building, fence, pole, vegetation, and vehicle.

A GPS and IMU are used to estimate the odometer of the system.

Method

The semantic 3D mapping process is generated in the following manner:

The research involves semantic mapping followed by post-processing map refinement. By receiving odometry data from the GPS and IMU, a global 3D is generated.

Camera-based 2D segmentation is performed to a generate a 3D semantic map. This is quicker than 3D semantic segmentation and is also more performant.

For semantic mapping, coordinate alignment is performed where the 3D LiDAR’s data is used to map each voxel to each pixel. To minimize errors, a clustering method and random forest classifier are used. Then a probability distribution algorithm is used to assign the semantic labels for each voxel.

An algorithm is used to figure out noise voxels, that is outlier voxels that could be wrongly segmented. This is then rectified with the formerly generated semantic map to produce a final semantic map with lesser errors.

Result

The KITTI dataset was used to perform experiments. In qualitative evaluation, it was noted that some parts of the map which were not trained in the 2D semantic segmentation had been assigned labels that most closely resembled one of the pre-trained labels.

Inference

This is a very good study for WARG’s specific use case. On one note, it is used to map urban environments. However, it does a pretty good job of estimating and semantically classifying the environment. For WARG’s use case, it could start with something basic like differentiating the ground from some other inclined or elevated surface, and it could pick up from there. A disadvantage is this approach requires a GPS and IMU which specifically includes a filtering-based state estimation process which has centimetre-level granularity for the 3D mapping process. However, it does have a lot of information and resources that could be carried over to the WARG UAV.

References

  1. Towards a Meaningful 3D Map Using a 3D Lidar and a Camera

  2. https://www.ri.cmu.edu/pub_files/2014/7/Ji_LidarMapping_RSS2014_v8.pdf

  3. https://doi.org/10.1109/CVPR.2012.6248007