摘要
单幅图像的深度估计是场景几何理解过程中的一个重要步骤,但由于尺度模糊,也被计算机视觉领域普遍认为是一个典型的不适定问题。近年来,尽管监督学习方法在单目深度估计中取得了基本令人满意的效果,但需要对数据集进行大量真实深度值的标记,这是一项成本较高的工作。此外,由于物体的运动、遮挡、光照等常见问题,单目深度估计的表现并不尽如人意,尤其是在物体边缘和弱纹理区域。为了解决这些问题,本文提出了一种基于自注意力的多阶段无监督单目深度估计网络。该方法具有以下特点:1)多阶段网络结构对训练过程中的深度估计具有较强的约束和监督作用;2)通过掩模加权重构损失和左右视差一致性损失对网络进行优化;3)采用自注意力机制捕捉更多上下文信息,进而提升预测结果。实验结果表明,该方法在KITTI数据集上的深度估计效果达到甚至超过了已有方法。
Monocular depth estimation is an important but ill-posed procedure in the process of scene geometry understanding.Though recent supervised learning methods have achieved promising results for monocular depth estimation,they require vast amounts of ground truth depth data which is a costly task.Besides,previous works suffer from well-known problems such as moving objects,occlusions and lighting,which result in unsatisfactory performance,particularly in object edges and low-texture regions.To tackle these problems,we propose a self-attention based multi-stage network for unsupervised monocular depth estimation.Our method incorporates the following features:1)multi-stage network provides stronger constraint and supervision for depth estimation during training;2)the network is optimized with mask weighted reconstruction loss and left-right disparity consistency loss;3)self-attention module is adopted to capture more context information.Experimental results on the KITTI dataset show that the method can obtain state-of-the-art performance,which means the proposed method can effectively improve the performance of monocular depth estimation.
作者
刘香凝
赵洋
王荣刚
Liu Xiangning;Zhao Yang;Wang Ronggang(School of Electronic and Computer Engineering,Shenzhen Graduate School,Peking University,Shenzhen,Guangdong 518055,China;Peng Cheng Laboratory,Shenzhen,Guangdong 518055,China;The School of Computer and Information,Hefei University of Technology,Hefei,Anhui 230009,China)
出处
《信号处理》
CSCD
北大核心
2020年第9期1450-1456,共7页
Journal of Signal Processing
基金
国家自然科学基金资助项目(61672063)
深圳市研究项目(JCYJ20180503182128089,201806080921419290)。
关键词
无监督学习
单目深度估计
多阶段网络
自注意力
unsupervised learning
monocular depth estimation
multi-stage network
self-attention