Spatiotemporal fusion-based multimodal road feature extraction for 3D visual perception

WU Wenxuan; MENG Weiliang; ZHANG Xiaopeng

doi:10.13505/j.1007-1482.2025.30.01.009

WU Wenxuan, MENG Weiliang, ZHANG Xiaopeng. Spatiotemporal fusion-based multimodal road feature extraction for 3D visual perceptionJ. Chinese Journal of Stereology and Image Analysis, 2025, 30(1): 93-101. DOI: 10.13505/j.1007-1482.2025.30.01.009

Citation:

Spatiotemporal fusion-based multimodal road feature extraction for 3D visual perception

Abstract

Abstract

Three-dimensional visual perception, a core technology of intelligent driving systems, constructs geometrically and semantically enriched vectorized scene representations by fusing multimodal sensor data (including LiDAR point clouds, camera images, and radar signals). This paper proposes a spatiotemporal fusion-based multimodal road feature parsing framework that innovatively combines transformer architectures with bird's-eye view (BEV) representation learning to design a road feature extraction system. The proposed system employs a multi-scale feature pyramid for heterogeneous sensor data extraction and utilizes attention mechanisms for multi - perspective feature alignment and transformation into BEV space. Furthermore, a spatiotemporal fusion methodology is introduced to enable adaptive integration of multi-frame observational data, thus improving accuracy and recall rate. This system can be widely applied to offline automated annotation systems for automatically generating training ground truth for vehicle online perception models. Experimental results demonstrate that the framework achieves superior precision and recall rates in lane marking and road boundary detection on our proprietary autonomous driving dataset compared to conventional approaches.

FullText(HTML)

References (17)

Cited By

Turn off MathJax

Article Contents

Spatiotemporal fusion-based multimodal road feature extraction for 3D visual perception

Abstract

Catalog

Export File

Citation

Format

Content