摘要
在自动驾驶场景下的3D目标检测任务中,探索毫米波雷达数据作为RGB图像输入的补充正成为多模态融合的新兴趋势。然而,现有的毫米波雷达-相机融合方法高度依赖于相机的一阶段检测结果,导致整体性能不够理想。本文提供了一种不依赖于相机检测结果的鸟瞰图下双向融合方法(BEV-radar)。对于来自不同域的两个模态的特征,BEV-radar设计了一个双向的基于注意力的融合策略。具体地,以基于BEV的3D目标检测方法为基础,我们的方法使用双向转换器嵌入来自两种模态的信息,并根据后续的卷积块强制执行局部空间关系。嵌入特征后,BEV特征在3D对象预测头中解码。我们在nu Scenes数据集上评估了我们的方法,实现了48.2 m AP和57.6 NDS。结果显示,与仅使用相机的基础模型相比,不仅在精度上有所提升,特别地,速度预测误差项有了相当大的改进。代码开源于https://github.com/Etah0409/BEV-Radar。
Exploring millimeter wave radar data as complementary to RGB images for ameliorating 3D object detection has become an emerging trend for autonomous driving systems.However,existing radar-camera fusion methods are highly dependent on the prior camera detection results,rendering the overall performance unsatisfactory.In this paper,we propose a bidirectional fusion scheme in the bird-eye view(BEV-radar),which is independent of prior camera detection results.Leveraging features from both modalities,our method designs a bidirectional attention-based fusion strategy.Specifically,following BEV-based 3D detection methods,our method engages a bidirectional transformer to embed information from both modalities and enforces the local spatial relationship according to subsequent convolution blocks.After embedding the features,the BEV features are decoded in the 3D object prediction head.We evaluate our method on the nuScenes dataset,achieving 48.2 mAP and 57.6 NDS.The result shows considerable improvements compared to the camera-only baseline,especially in terms of velocity prediction.The code is available at https://github.com/Etah0409/BEV-Radar.
作者
赵园
张露
邓家俊
张燕咏
Yuan Zhao;Lu Zhang;Jiajun Deng;Yanyong Zhang(School of Computer Science and Technology,University of Science and Technology of China,Hefei 230027,China;Institute of Artificial Intelligence,Hefei Comprehensive National Science Center,Hefei 230088,China;Department of Electrical Engineering,University of Sydney,NSW 2006,Australia)
出处
《中国科学技术大学学报》
CAS
CSCD
北大核心
2024年第1期2-9,1,I0001,共10页
JUSTC