期刊文献+

面向多模态交互式融合与渐进式优化的三维视觉理解 被引量:1

3D visual understanding oriented towards multimodal interactivefusion and progressive refinement
下载PDF
导出
摘要 三维视觉理解旨在智能地感知和解释三维场景,实现对物体、环境和动态变化的深入理解与分析。三维目标检测作为其核心技术,发挥着不可或缺的作用。针对当前的三维检测算法对于远距离目标和小目标检测精度较低的问题,提出了一种面向多模态交互式融合与渐进式优化的三维目标检测方法MIFPR。在特征提取阶段,首先引入自适应门控信息融合模块。通过把点云的几何特征融入图像特征中,能够获取对光照变化更有辨别力的图像表示。随后提出基于体素质心的可变形跨模态注意力模块,以驱使图像中丰富的语义特征和上下文信息融合到点云特征中。在目标框优化阶段,提出渐进式注意力模块,通过学习、聚合不同阶段的特征,不断增强模型对于精细化特征的提取与建模能力,逐步优化目标框,以提升对于远距离、小目标的检测精度,进而提高对于视觉场景理解的能力。在KITTI数据集上,所提方法对于pedestrian和cyclist等小目标的检测精度较最优基线有明显提升,证实了该方法的有效性。 3D visual understanding aims to intelligently perceive and interpret 3D scenes,achieving a profound understanding and analysis of objects,environment,and dynamic changes.As its core technology,3D object detection plays an indispensable role.For the problem of low detection accuracy of distant targets and small targets in current 3D detection algorithms,this paper proposed a 3D object detection method called MIFPR,which was oriented towards multimodal interactive fusion and progressive refinement.In the feature extraction stage,this algorithm introduced an adaptive gated information fusion module firstly.Incorporating the geometric features of the point cloud into the image features results in a more discriminative image representation for handling variations in lighting conditions.Subsequently,the proposed voxel centroid-based deformable cross-modal attention module was to drive the fusion of rich semantic features and contextual information from images into the point cloud features.During the proposal refinement stage,this algorithm introduced a progressive attention module.By learning and aggregating features from different stages,it continuously enhanced the model’s ability to extract and model fine-grained features,progressively refining bounding boxes.This gradual refinement of the proposal helps improve the detection accuracy of distant and small objects,thereby enhancing the overall capability of visual scene understanding.The proposed method shows significant improvement in the detection accuracy of small objects like pedestrian and cyclist on the KITTI dataset compared to the state-of-the-art baseline.This confirms the effectiveness of the proposed approach.
作者 何鸿添 陈晗 刘洋 周礼亮 张敏 雷印杰 He Hongtian;Chen Han;Liu Yang;Zhou Liliang;Zhang Min;Lei Yinjie(College of Electronics&Information Engineering,Sichuan University,Chengdu 610065,China;Key Laboratory of Optical Engineering,Institute of Optics&Electronics,Chinese Academy of Sciences,Chengdu 610209,China;CETC Key Laboratory of Avionic Information System Technology,The 10th Research Institute of China Electronics Technology Group Corporation,Chengdu 610036,China)
出处 《计算机应用研究》 CSCD 北大核心 2024年第5期1554-1561,共8页 Application Research of Computers
基金 国家自然科学基金面上项目(62276176)。
关键词 三维视觉理解 多模态 交互式融合 渐进式注意力 目标检测 3D visual understanding multimodal interactive fusion progressive attention object detection
  • 相关文献

参考文献4

二级参考文献12

共引文献56

同被引文献11

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部