期刊文献+

基于视频分段的空时双通道卷积神经网络的行为识别 被引量:8

Two-stream CNN for action recognition based on video segmentation
下载PDF
导出
摘要 针对原始空时双通道卷积神经网络(CNN)模型对长时段复杂视频中行为识别率低的问题,提出了一种基于视频分段的空时双通道卷积神经网络的行为识别方法。首先将视频分成多个等长不重叠的分段,对每个分段随机采样得到代表视频静态特征的帧图像和代表运动特征的堆叠光流图像;然后将这两种图像分别输入到空域和时域卷积神经网络进行特征提取,再在两个通道分别融合各视频分段特征得到空域和时域的类别预测特征;最后集成双通道的预测特征得到视频行为识别结果。通过实验讨论了多种数据增强方法和迁移学习方案以解决训练样本不足导致的过拟合问题,分析了不同分段数、预训练网络、分段特征融合方案和双通道集成策略对行为识别性能的影响。实验结果显示所提模型在UCF101数据集上的行为识别准确率达到91.80%,比原始的双通道模型提高了3.8个百分点;同时在HMDB51数据集上的行为识别准确率也比原模型提高,达到61.39%,这表明所提模型能够更好地学习和表达长时段复杂视频中人体行为特征。 Aiming at the issue that original spatial-temporal two-stream Convolutional Neural Network (CNN) model has low accuracy for action recognition in long and complex videos,a two-stream CNN for action recognition based on video segmentation was proposed. Firstly,a video was split into multiple non-overlapping segments with same length. For each segment,one frame image was sampled randomly to represent its static features and stacked optical flow images were calculated to represent its motion features. Secondly,these two patterns of images were input into the spatial CNN and temporal CNN for feature extraction,respectively. And the classification prediction features of spatial and temporal domains for action recognition were obtained by merging all segment features in two streams respectively. Finally,the two-steam predictive features were integrated to obtain the action recognition results for the video. In series of experiments,some data augmentation techniques and transfer learning methods were discussed to solve the problem of over-fitting caused by the lack of training samples. The effects of various factors including the number of segments,network architectures,feature fusion schemes based on segmentation and two-stream integration strategy on the performance of action recognition were analyzed. The experimental results show that the accuracy of action recognition of the proposed model on dataset UCF101 reaches 91.80%,which is 3.8% higher than that of original two-stream CNN model;and the accuracy of the proposed model on dataset HMDB51 is improved to 61.39%,which is higher than that of the original model. It shows that the proposed model can better learn and express the action features in long and complex videos.
作者 王萍 庞文浩 WANG Ping;PANG Wenhao(School of Electronic and Information Engineering,Xi’an Jiaotong University,Xi’an Shaanxi 710049,China)
出处 《计算机应用》 CSCD 北大核心 2019年第7期2081-2086,共6页 journal of Computer Applications
基金 国家自然科学基金资助项目(61671365)~~
关键词 双通道卷积神经网络 行为识别 视频分段 迁移学习 特征融合 two-stream Convolutional Neural Network (CNN) action recognition video segmentation transfer learning feature fusion
  • 相关文献

参考文献2

二级参考文献112

  • 1Over P,A wad G,Martial M, et al. Trecvid 2014-anoverview of the goals, tasks, data,evaluation mechanismsand metrics [C/OL] //Proc of TRECVID 2014. [ 2014-07-09]. http://www. nist. gov/itl/iad/mig/trecvid_sed_2014. cfm. 被引量:1
  • 2Soomro K, Zamir A, Shah M. UCF101 : A dataset of 101human actions classes from videos in the wild, CRCV-TR-12-01 [R/OL]. (2012-12-01) [2015-04-15]. http://crcv.ucf.edu/data/UCF101. php. 被引量:1
  • 3Aggarwal J, Ryoo M. Human activity analysis: A review[J]. ACM Computing Surveys,2011, 43(3) : 1-43. 被引量:1
  • 4Turaga P,Chellappa R,Subrahmanian V,et al. Machinerecognition of human activities: A survey [J]. IEEE Transon Circuits and Systems for Video Technology, 2008, 18(11): 1473-1488. 被引量:1
  • 5Poppe R. A survey on vision-based human action recognition[J]. Image and Vision Computing, 2010, 28(6) : 976-990. 被引量:1
  • 6Kru"ger V,Kragic D,Ude A,et al. The meaning of action:A review on action recognition and mapping [J]. AdvancedRobotics, 2007, 21(13): 1473-1501. 被引量:1
  • 7Ye Mao, Zhang Qing, Wang Liang, et al. A survey onhuman motion analysis from depth data [C] //Proc of Time-of-Flight and Depth Imaging, Sensors,Algorithms, andApplications. New York: Elsevier Science Inc, 2013: 495-187'. 被引量:1
  • 8Ke S,Thuc H, Lee Y,et al. A review on video-basedhuman activity recognition [J]. Computers, 2013,2(2) : 88-131. 被引量:1
  • 9Vishwakarma S, Agrawal A. A survey on activityrecognition and behavior understanding in video surveillance[J]. The Visual Computer, 2013,29(10) : 983-1009. 被引量:1
  • 10Chaquet J, Carmona E, Caballero A. A survey of videodatasets for human action and activity recognition [J].Computer Vision and Image Understanding, 2013, 117(6):633-659. 被引量:1

共引文献45

同被引文献69

引证文献8

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部