期刊文献+

基于金字塔型残差神经网络的红外图像深度估计 被引量:5

Depth Estimation of Infrared Image Based on Pyramid Residual Neural Networks
下载PDF
导出
摘要 对车载红外图像进行深度估计,可应用于车辆的夜间辅助驾驶系统(Driver Assistant Systems,DAS),本文提出了一种新型的神经网络结构来估计红外图像的深度。受景物分类思想的启发,将传统深度估计方法中的回归问题转化为分类问题。首先,对红外图像进行归一化预处理,并将深度图置于自然对数空间对距离进行远近分类。其次,设计了一种新型的金字塔输入残差神经网络(Pyramid Residual Neural Networks,PRN),将红外图像以金字塔型结构作为网络输入,网络结构分为粗略特征提取和精细特征提取两部分。最后,将全连接层改为全卷积层,大大减少了网络中的参数个数,降低计算复杂度。金字塔型结构的输入使得网络能够多尺度提取特征,这使得估计出的深度图场景中的对象轮廓比同一网络单一红外图像输入估计出的景物轮廓更清晰。此外,通过计算错误和准确性评价指标,证明本文的提出方法能够很好地估计红外图像的深度,对比实验验证了本文方法更具优势。 Depth estimation of vehicle infrared images can be applied to a vehicle's night-assisted driving system(driver assistant system, DAS). This paper presents a novel type of neural network structure to estimate the depth of infrared images. Inspired by the idea of classification of scenes, the regression problem proposed in the traditional depth estimation of images is transformed into the classification problem in this study. Firstly, the normalization of the infrared image is carried out, and the depth map is placed in a natural logarithmic space to classify the distance. Secondly, a new pyramid residual neural network(pyramid residual neural network, PRN) is designed, which uses the pyramid structure as the network input, and the network structure is divided into coarse and refined feature extractions. Fully connected layers are converted to fully convolutional layers, which greatly reduces the number of parameters in the network and the computational complexity compared to fully connected networks. The input of the pyramid structure allows the networks to extract features at multiple scales. This makes the contours of the objects in the depth map scene clearer than in the same network without a pyramid input structure. In addition, by calculating the error and accuracy evaluation index, it is proved that the method proposed in this paper can estimate the depth of the infrared images well. Moreover, the comparison experiments prove that the proposed method is more advantageous.
作者 顾婷婷 赵海涛 孙韶媛 GU Tingting;ZHAO Haitao;SUN Shaoyuan(School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China;School of Information Science and Technology, Donghua University, Shanghai 201620, China)
出处 《红外技术》 CSCD 北大核心 2018年第5期417-423,共7页 Infrared Technology
基金 国家自然科学基金(61375007) 上海市科委基础研究项目(15JC1400600)
关键词 深度估计 车载红外图像 金字塔型输入 残差网络 多尺度特征 depth estimation vehicle infrared images pyramid input residual networks multi-scale features
  • 相关文献

参考文献1

二级参考文献21

  • 1Saxena A, Chung S H, Ng A Y. 3-D depth reconstruction from a single still image[J]. International Journal of Computer Vision, 2008, 76(1): 53-69. 被引量:1
  • 2Horn B K P. Obtaining shape from shading information[M]. New York: MIT Press, 1989: 123-171. 被引量:1
  • 3Saxena A, Chung S H, Ng A Y. Learning depth from single monocular images [C]. Advances in Neural Information Processing Systems, 2005: 1161-1168. 被引量:1
  • 4Saxena A, Sun M, Ng A Y. Make 3D: Learning 3D scene structure from a single still image[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(5): 824-840. 被引量:1
  • 5Saxena A, Schulte J, Ng A Y. Depth estimation using monocular and stereo cues [C] . International Joint Conference on Artificial Intelligence, 2007: 2197-2203. 被引量:1
  • 6Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C] Advances in Neural Information Processing Systems, 2012 : 1106-1114. 被引量:1
  • 7Karpathy A, Toderici G, Shetty S, et ai. Large-scale video classification with convolutional neural networks[C] . IEEE Conference on Computer Vision and Pattern Recognition, 2014: 1725-1732. 被引量:1
  • 8Liang M, Hu X. Recurrent convolutional neural network for object recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2015: 3367-3375. 被引量:1
  • 9Lee S C, Nevatia R. Extraction and integration of window in a 3D building model from ground view images [C]. IEEE Computer Conference on Computer Vision and Pattern Recognition, 2004: 113-120. 被引量:1
  • 10Liu L, Yu G, Zokai S, et al. Multiview geometry for texture mapping 2D images onto 3D range data [C]. IEEE Conference on Computer Vision and Pattern Recognition, 2006, 2: 2293-2300. 被引量:1

共引文献25

同被引文献38

引证文献5

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部