Joint Semantic Segmentation Using Representations of Lidar Point Clouds and Camera Images

13 Pages Posted: 22 Apr 2023

See all articles by Yue Wu

Yue Wu

Xidian University

Jiaming Liu

Xidian University

Maoguo Gong

Xidian University

Qiguang Miao

Xidian University

Wenping Ma

Xidian University

Multiple version iconThere are 2 versions of this paper

Abstract

LiDAR and camera are two common vision sensors used in the real world, producing complementary point clouds and image data. While multimodal data has previously been found mostly in 3D detection and tracking, we aim to study large-scale semantic segmentation by multimodal data fusion rather than only knowledge transfer or distillation. We show that fusing LiDAR features with camera features and abandoning the strict point-to-pixel hard correlation can lead to better performance. Even so, it’s still hard to exploit them fully due to the significant differences between the patterns. To address this issue, we propose the Joint Semantic Segmentation (JoSS), a powerful LiDAR-camera fusion solution that employs the attention mechanism to explore the potential relationships between point clouds and images. Specifically, JoSS consists of commonly used 3D and 2D backbones, and lightweight transformer decoders based on point clouds and images. A point cloud decoder adopts queries to analyze the semantics from LiDAR features, and an image decoder adaptively fuses these queries with corresponding image features. Both exploit contextual information, thus fully mining multimodal information for semantic segmentation. In addition, we propose an effective unimodal data augmentation (UDA) method that performs cross-modal contrastive learning on point clouds and images to significantly improve accuracy by augmenting the point cloud alone without the complexity of generating paired samples of both modalities. Our Joss achieves advanced results in two widely used large-scale benchmarks, SemanticKITTI and nuScenes-lidarseg.

Keywords: Joint 3D-2D learning, Contrastive learning, information fusion, Large-scale semantic segmantion, Point cloud segmantion

Suggested Citation

Wu, Yue and Liu, Jiaming and Gong, Maoguo and Miao, Qiguang and Ma, Wenping, Joint Semantic Segmentation Using Representations of Lidar Point Clouds and Camera Images. Available at SSRN: https://ssrn.com/abstract=4426132 or http://dx.doi.org/10.2139/ssrn.4426132

Yue Wu

Xidian University ( email )

Xi'an Chang'an two hundred ten National Road
Xian
China

Jiaming Liu

Xidian University ( email )

Xi'an Chang'an two hundred ten National Road
Xian
China

Maoguo Gong (Contact Author)

Xidian University ( email )

Xi'an Chang'an two hundred ten National Road
Xian
China

Qiguang Miao

Xidian University ( email )

Xi'an Chang'an two hundred ten National Road
Xian
China

Wenping Ma

Xidian University ( email )

Xi'an Chang'an two hundred ten National Road
Xian
China

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
149
Abstract Views
339
Rank
376,306
PlumX Metrics