期刊文献+

反馈学习高斯表观网络的视频目标分割 被引量:1

Feedback Learning Gaussian Appearance Network for Video Object Segmentation
下载PDF
导出
摘要 大量基于深度学习的视频目标分割方法存在两方面局限性:1)单帧编码特征直接输入网络解码器,未能充分利用多帧特征,导致解码器输出的目标表观特征难以自适应复杂场景变化;2)常采用前馈网络结构,阻止了后层特征反馈前层进行补充学习,导致学习到的表观特征判别力受限.为此,本文提出了反馈高斯表观网络,通过建立在线高斯模型并反馈后层特征到前层来充分利用多帧、多尺度特征,学习鲁棒的视频目标分割表观模型.网络结构包括引导、查询与分割三个分支.其中,引导与查询分支通过共享权重来提取引导与查询帧的特征,而分割分支则由多尺度高斯表观特征提取模块与反馈多核融合模块构成.前一个模块通过建立在线高斯模型融合多帧、多尺度特征来增强对外观的表征力,后一个模块则通过引入反馈机制进一步增强模型的判别力.最后,本文在三个标准数据集上进行了大量评测,充分证明了本方法的优越性能. There are two limitations in existing deep learning based video object segmentation methods:1)the single frame encoding features are directly input into the network decoder,which fails to make full use of the multiframe features,resulting in the difficulty in adapting complex scene changes of the target appearance features of the decoded output;2)the feedforward network structure is adopted to prevent the feature feedback of the latter layer from the former layer for complementary learning.Therefore,this paper proposes a feedback Gaussian appearance network.By building an online Gaussian model and feedback the features of the back layer to the front layer,we can make full use of the multi-frame and multi-scale features to learn a robust video object segmentation model.Network structure includes three branches:guidance,query and segmentation branches.The guidance and the query branches extract the features of the guidance frame and the query frame by sharing the weights of the network,while the segmentation branch is composed of the multi-scale Gaussian appearance feature extraction module and the feedback multi-kernel fusion module.The former module enhances the representation of the appearance by building an online Gaussian model to fuse the multi-frame and multi-scale features,and the second module further enhances the discriminative capability of the model by introducing a feedback mechanism.Finally,experiments are carried out on three benchmark datasets,which fully proves the superiority of this method.
作者 王龙 宋慧慧 张开华 刘青山 WANG Long;SONG Hui-Hui;ZHANG Kai-Hua;LIU Qing-Shan(Collaborative Innovation Center on Atmospheric Environment and Equipment Technology,Jiangsu Key Laboratory of Big Data Analysis Technology,Nanjing University of Information Science and Technology,Nanjing 210044)
出处 《自动化学报》 EI CAS CSCD 北大核心 2022年第3期834-842,共9页 Acta Automatica Sinica
基金 国家新一代人工智能重大项目(2018AAA0100400) 国家自然科学基金(61872189,61876088,61532009) 江苏省自然科学基金(BK20191397,BK20170040)资助。
关键词 视频目标分割 表观建模 反馈机制 深度学习 Video object segmentation appearance model feedback mechanism deep learning
  • 相关文献

参考文献3

二级参考文献23

  • 1褚一平,叶修梓,张引,张三元.基于分层MRF模型的抗抖动视频分割算法[J].浙江大学学报(工学版),2007,41(11):1793-1796. 被引量:2
  • 2包红强,张兆扬,陈右铭.基于时空曲线演化的多视频运动对象分割算法[J].电子学报,2005,33(1):181-185. 被引量:2
  • 3陈睿,邓宇,向世明,李华.结合强度和边界信息的非参数前景/背景分割方法[J].计算机辅助设计与图形学学报,2005,17(6):1278-1284. 被引量:13
  • 4Yang T, Li S Z, Pan Q, Li J. Real-time and accurate segmentation of moving objects in dynamic scene. In: Proceedings of the ACM 2nd International Workshop on Video Surveillance and Sensor Networks. New York, USA: IEEE, 2004. 136-143 被引量:1
  • 5Stauffer C, Grimson W. Learning patterns of activity using real-time tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 747-757 被引量:1
  • 6Zivkovic Z. Improved adaptive Gaussian mixture model for background subtraction. In: Proceedings of the 17th International Conference on Pattern Recognition. USA: IEEE, 2004. 28-31 被引量:1
  • 7Stenger B, Ramesh V, Paragios N, Coetzee F, Buhmann J M. Topology free hidden Markov models: application to background modeling. In: Proceedings of the 8th International Conference of Computer Vision. USA: IEEE, 2001. 294-301 被引量:1
  • 8Migdal J, Grimson E. Background subtraction using Markov thresholds. In: Proceedings of Workshop on Motion and Video Computing. USA: IEEE, 2005. 58-65 被引量:1
  • 9Zhou Y, Xu W, Tao H, Gong Y H. Background segmentation using spatial-temporal multi-resolution MRF. In: Proceedings of Workshop on Motion and Video Computing. USA: IEEE, 2005. 8-13 被引量:1
  • 10Elgammal A, Duraiswami R, Harwood D, Davis L S. Background and foreground modeling using nonparametric kernel density estimation for visual surveillance. Proceedings of IEEE, 2002, 90(7): 1151-1163 被引量:1

共引文献32

同被引文献17

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部