Advanced Search+
WU Wenxuan, MENG Weiliang, ZHANG Xiaopeng. Spatiotemporal fusion-based multimodal road feature extraction for 3D visual perceptionJ. Chinese Journal of Stereology and Image Analysis, 2025, 30(1): 93-101. DOI: 10.13505/j.1007-1482.2025.30.01.009
Citation: WU Wenxuan, MENG Weiliang, ZHANG Xiaopeng. Spatiotemporal fusion-based multimodal road feature extraction for 3D visual perceptionJ. Chinese Journal of Stereology and Image Analysis, 2025, 30(1): 93-101. DOI: 10.13505/j.1007-1482.2025.30.01.009

Spatiotemporal fusion-based multimodal road feature extraction for 3D visual perception

  • Three-dimensional visual perception, a core technology of intelligent driving systems, constructs geometrically and semantically enriched vectorized scene representations by fusing multimodal sensor data (including LiDAR point clouds, camera images, and radar signals). This paper proposes a spatiotemporal fusion-based multimodal road feature parsing framework that innovatively combines transformer architectures with bird's-eye view (BEV) representation learning to design a road feature extraction system. The proposed system employs a multi-scale feature pyramid for heterogeneous sensor data extraction and utilizes attention mechanisms for multi - perspective feature alignment and transformation into BEV space. Furthermore, a spatiotemporal fusion methodology is introduced to enable adaptive integration of multi-frame observational data, thus improving accuracy and recall rate. This system can be widely applied to offline automated annotation systems for automatically generating training ground truth for vehicle online perception models. Experimental results demonstrate that the framework achieves superior precision and recall rates in lane marking and road boundary detection on our proprietary autonomous driving dataset compared to conventional approaches.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return