摘要
单目视觉惯性同步定位与地图构建(visual-inertial simultaneous localization and mapping,VI-SLAM)技术因具有硬件成本低、无需对外部环境进行布置等优点,得到了广泛关注,在过去的十多年里取得了长足的进步,涌现出诸多优秀的方法和系统。由于实际场景的复杂性,不同方法难免有各自的局限性。虽然已经有一些工作对VISLAM方法进行了综述和评测,但大多只针对经典的VI-SLAM方法,已不能充分反映最新的VI-SLAM技术发展现状。本文首先对基于单目VI-SLAM方法的基本原理进行阐述,然后对单目VI-SLAM方法进行分类分析。为了综合全面地对比不同方法之间的优劣势,本文特别选取3个公开数据集对代表性的单目VI-SLAM方法从多个维度上进行定量评测,全面系统地分析了各类方法在实际场景尤其是增强现实应用场景中的性能。实验结果表明,基于优化或滤波和优化相结合的方法一般在跟踪精度和鲁棒性上比基于滤波的方法有优势,直接法/半直接法在全局快门拍摄的情况下精度较高,但容易受卷帘快门和光照变化的影响,尤其是大场景下误差累积较快;结合深度学习可以提高极端情况下的鲁棒性。最后,针对深度学习与V-SLAM/VI-SLAM结合、多传感器融合以及端云协同这3个研究热点,对SLAM的发展趋势进行讨论和展望。
Monocular visual-inertial simultaneous localization and mapping(VI-SLAM)is an important research topic in computer vision and robotics.It aims to estimate the pose(i.e.,the position and orientation)of the device in real-time using a monocular camera with an inertial sensor while constructing the map of the environment.With the rapid development of various fields,such as augmented/virtual reality(AR/VR),robotics,and autonomous driving,monocular VISLAM has received widespread attention due to its advantages,including low hardware cost and no requirement for an external environment setup,among others.Over the past decade or so,monocular VI-SLAM has made significant progress and spawned many excellent methods and systems.However,because of the complexity of real-world scenarios,different methods have also shown distinct limitations.Although some works have reviewed and evaluated VI-SLAM methods,most of them only focus on classic methods,which cannot fully reflect the latest development status of VI-SLAM technology.Based on optimization type,VI-SLAM can be divided into filtering-and optimization-based methods.Filtering-based methods use filters to fuse observations from visual and inertial sensors,continuously updating the device’s state information for localization and mapping.Additionally,depending on whether visual data association(or feature matching)is performed separately,existing methods can be divided into indirect methods(or feature-based methods)and direct methods.Furthermore,with the development and widespread application of deep learning technology,researchers have started to incorporate deep learning methods into VI-SLAM to enhance robustness in extreme conditions or perform dense reconstruction.This paper first elaborates on the basic principles of monocular VI-SLAM methods and then classifies them analytically into direct and filtering-,optimization-,feature-,and deep learning-based methods.However,most of the existing datasets and benchmarks are focused on applications like autonomous driving and dro
作者
章国锋
黄赣
谢卫健
陈丹鹏
王楠
刘浩敏
鲍虎军
Zhang Guofeng;Huang Gan;Xie Weijian;Chen Danpeng;Wang Nan;Liu Haomin;Bao Hujun(State Key Laboratory of CAD&CG,Zhejiang University,Hangzhou 310058,China;SenseTime Research,Hangzhou 311215,China)
出处
《中国图象图形学报》
CSCD
北大核心
2024年第10期2839-2858,共20页
Journal of Image and Graphics
基金
国家自然科学基金项目(61932003)。