期刊文献+

时空特征金字塔模块下的视频行为识别 被引量:5

Video Action Recognition Based on Spatio-Temporal Feature Pyramid Module
下载PDF
导出
摘要 目前用于视频行为识别的主流2D卷积神经网络方法无法提取输入帧之间的相关信息,导致网络无法获得输入帧间的时空特征信息进而难以提升识别精度。针对目前主流方法存在的问题,提出了通用的时空特征金字塔模块(STFPM)。STFPM由特征金字塔和空洞卷积金字塔两部分组成,并能直接嵌入到现有的2D卷积神经网络中构成新的行为识别网络——时空特征金字塔网络(STFP-Net)。针对多帧图像输入,STFP-Net首先提取每帧输入的单独空域特征信息,并将这些特征信息记为原始特征;然后,所设计的STFPM利用矩阵转换操作对原始特征构建特征金字塔;其次,利用空洞卷积金字塔对构建的原始特征金字塔提取具有时空关联性的时序特征;接着,将原始特征与时序特征进行加权融合并传递给后续深层网络;最后,利用全连接对网络输出特征进行分类识别。与Baseline相比,STFP-Net引入了可忽略不计的额外参数和计算量。实验结果表明,与近些年主流方法相比,STFP-Net在主流数据库UCF101和HMDB51上的分类准确度具有明显提升。 At present, the mainstream 2D convolution neural network method for video action recognition can ’ t extract the relevant information between input frames, which makes it difficult for the network to obtain the spatiotemporal feature information between input frames and improve the recognition accuracy. To solve the existing problems, a universal spatio-temporal feature pyramid module(STFPM) is proposed. STFPM consists of feature pyramid and dilated convolution pyramid, which can be directly embedded into the existing 2D convolution network to form a new action recognition network named spatio-temporal feature pyramid net(STFP-Net). For multi-frame image input, STFP-Net first extracts the individual spatial feature information of each frame input and records it as the original feature. Then, the designed STFPM uses matrix operation to construct the feature pyramid of the original feature. Furthermore, the spatio-temporal features with temporal and spatial correlation are extracted by applying the dilated convolution pyramid to feature pyramid. Next, the original features and spatio-temporal features are fused by a weighted summation and transmitted to the deep network. Finally, the action in the video is classified by full connected layer. Compared with Baseline, STFP-Net introduces negligible additional parameters and computational complexity. Experimental results show that compared with mainstream methods in recent years,STFP-Net has significant improvement in classification accuracy on the general datasets UCF101 and HMDB51.
作者 龚苏明 陈莹 GONG Suming;CHEN Ying(Key Laboratory of Advanced Process Control for Light Industry,Ministry of Education,Jiangnan University,Wuxi,Jiangsu 214122,China)
出处 《计算机科学与探索》 CSCD 北大核心 2022年第9期2061-2067,共7页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金(61573168)。
关键词 行为识别 2D卷积网络 时空特征 特征金字塔 空洞卷积金字塔 action recognition 2D convolution network spatio-temporal features feature pyramid dilated convolution pyramid
  • 相关文献

参考文献4

二级参考文献23

  • 1BEBAR A A and HEMAYED E E. Comparative study for feature detector in human activity recognition[C]. IEEE the9th International conference on Computer Engineering Conference, Giza, 2013: 19-24. doi: 10.1109/ICENCO.2013. 6736470. 被引量:1
  • 2LI F and DU J X. Local spatio-temporal interest point detection for human action recognition[C]. IEEE the 5th International Conference on Advanced Computational Intelligence, Nanjing, 2012: 579-582. doi: 10.1109/ICACI. 2012.6463231. 被引量:1
  • 3ONOFRI L, SODA P, and IANNELLO G. Multiple subsequence combination in human action recognition[J]. IEEE Journal on Computer Vision, 2014, 8(1): 26-34. doi: 10.1049/iet-cvi.2013.0015. 被引量:1
  • 4FOGGIA P, PERCANNELLA G, SAGGESE A, et al. Recognizing human actions by a bag of visual words[C]. IEEE International Conference on Systems, Man, and Cybernetics~ Manchester, 2013: 2910-2915. doi: 10.1109/SMC.2013.496. 被引量:1
  • 5ZHANG X, MIAO Z J, and WAN L. Human action categories using motion descriptors[C]. IEEE 19th International Conference on hnage Processing, Orlando, FL, 2012: 1381-1384. doi: 10.1109/ICIP.2012.6467126. 被引量:1
  • 6LI Y and KUAI Y H. Action recognition based on spatio-temporal interest point[C]. IEEE the 5th International. 被引量:1
  • 7Conference on Biomedical Engineering and Informatics, Chongqing, 2012: 181-185. doi: 10.1109/BMEI.2012.6512972. 被引量:1
  • 8REN H and MOSELUND T B. Action recognition using salient neighboring histograms[C]. IEEE the 20th International Conference on Image Processing, Melbourne, VIC, 2013: 2807-2811. doi: 10.1109/ICIP.2013.6738578. 被引量:1
  • 9COZAR J R, GONZALEZ-LINARES J M, GUIL N, et al. Visual words selection for human action classification[C]. International Conference on High Performance Computing and Simulation, Madrid, 2012: 188-194. doi: 10.1109/ HPCSim.2012.6266910. 被引量:1
  • 10WANG H R, YUAN C F, HU W M, et al. Action recognition using nonnegative action component representation and sparse basis selection[J]. IEEE Transactions on Image Processing, 2014, 23(2): 570-581. doi: 10.1109/TIP.2013. 2292550. 被引量:1

共引文献73

同被引文献72

引证文献5

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部