SalsaNext+: a multimodal-based point cloud semantic segmentation with range and RGB images

Advances in sensor fusion techniques are redefining the landscape of 3D point cloud semantic segmentation, particularly for autonomous driving applications. We propose an enhanced approach that leverages the complementary strengths of LiDAR and multi-camera systems. This study introduces two extensi...

Descripción completa

Detalles Bibliográficos
Autores: Sánchez García, Fabio, Montiel Marín, Santiago|||0009-0000-5492-0839, Antunes García, Miguel|||0009-0008-5627-5325, Gutiérrez Moreno, Rodrigo, Llamazares Llamazares, Ángel|||0000-0001-8273-5163, Bergasa Pascual, Luis Miguel|||0000-0002-0087-3077
Tipo de recurso: artículo
Fecha de publicación:2025
País:España
Institución:Universidad de Alcalá (UAH)
Repositorio:e_Buah Biblioteca Digital Universidad de Alcalá
Idioma:inglés
OAI Identifier:oai:ebuah.uah.es:10017/65779
Acceso en línea:http://hdl.handle.net/10017/65779
https://dx.doi.org/10.1109/ACCESS.2025.3559580
Access Level:acceso abierto
Palabra clave:Autonomous driving
Camera
LiDAR
Point cloud semantic segmentation
Sensor fusion
Electrónica
Electronics
Descripción
Sumario:Advances in sensor fusion techniques are redefining the landscape of 3D point cloud semantic segmentation, particularly for autonomous driving applications. We propose an enhanced approach that leverages the complementary strengths of LiDAR and multi-camera systems. This study introduces two extensions to the state-of-the-art SalsaNext model based only in LiDAR: SalsaNext+RGB, which integrates RGB data into range-view (RV) images, and SalsaNext+PANO, incorporating panoramic images built from multi-camera setups. The proposed methods are evaluated using the SemanticKITTI and Panoptic nuScenes datasets, showing notable improvements in segmentation accuracy. Results indicate that RGB fusion boosts performance with minimal latency, while panoramic integration offers additional gains at the expense of higher computational load. Comparative analyses highlight significant mIoU gains, demonstrating the potential of multimodal sensor fusion for intricate driving scene understanding.