期刊文献+

MRTP:时间-动作感知的多尺度时间序列实时行为识别方法 被引量:2

MRTP:Multi-Temporal Resolution Real-Time Action Recognition Approach by Time-Action Perception
下载PDF
导出
摘要 针对行为识别中时空信息分布不均衡以及对长时间跨度信息表征获取难的问题,提出了一种时间-动作感知的多尺度时间序列实时行为识别方法MRTP。以RGB视频为输入,使用两个并行的感知路径在不同的时间分辨率上对视频进行空间特征与动作特征提取。在空间路径中,使用基于特征差分的动作感知寻找并加强通道动作特征表征;在动作路径中,基于动作感知的权重对通道进行筛选,并加入通道注意力和时间注意力加强关键特征;在两个路径提取出特征后,对特征进行融合,融合后的特征通过激活函数映射出样本在各个类别的得分,取得分最高的类别为最终识别结果。实验结果表明:所提方法在UCF101数据集上达到了95.6%的准确率,优于未使用时间注意力的方法;在AVA2.2数据集上的平均精度达到了28%,优于未使用动作感知和时间注意力的方法。与目前主流的基于光流法的双流网络、以Slowfast为代表的3D卷积网络、Transformer等方法进行了准确率、参数量、处理速度对比,结果表明所提方法具有更良好的识别效果和鲁棒性。 In view of the uneven distribution of temporal and spatial information and the difficulty in obtaining long-term information representation,we propose a dual-path action recognition method MRTP based on time and motion perception,which takes RGB video as input and applies two parallel perception paths to extract spatial and motion features from the video at different time resolutions.In the spatial path,the action-perception based on feature difference is applied to find and strengthen the action feature representation;in the action path,the channel is filtered based on the weight of action perception,and channel attention and time attention are added to enhance the key features;the characteristics of the two paths are merged to calculate the action category score of the video.The experimental results show that MRTP achieves accuracy of 95.6%on the UCF101 data set to outperform the model without time attention;on the AVA2.2 data set,the accuracy of mAP reaches 28%to outperform the model without action perception and time attention.Compared with the current mainstream two-stream network,3D convolution,Transformer,and other methods on a number of accuracy indicators,this method is endowed with better recognition effect and robustness.
作者 张坤 杨静 张栋 陈跃海 李杰 杜少毅 ZHANG Kun;YANG Jing;ZHANG Dong;CHEN Yuehai;LI Jie;DU Shaoyi(School of Automation Science and Engineering, Xi’an Jiaotong University, Xi’an 710049, China;College of Artificial Intelligence, Xi’an Jiaotong University, Xi’an 710049, China)
出处 《西安交通大学学报》 EI CAS CSCD 北大核心 2022年第3期22-32,共11页 Journal of Xi'an Jiaotong University
基金 国家自然科学基金资助项目(62073257)。
关键词 行为识别 双路径网络 特征差分 动作感知 时间注意力 action recognition dual-path network feature difference action perception temporal attention
  • 相关文献

参考文献2

二级参考文献51

  • 1Fujiyoshi H, Lipton A J, Kanade T. Real-time human mo- tion analysis by image skeletonization. IEICE Transactions on Information and Systems, 2004, 87-D(1): 113-120. 被引量:1
  • 2Chaudhry R, Ravichandran A, Hager G, Vidal R. His- tograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of hu- man actions. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL: IEEE, 2009. 1932-1939. 被引量:1
  • 3Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE Con- ference on Computer Vision and Pattern Recognition. San Diego, CA, USA: IEEE, 2005. 886-893. 被引量:1
  • 4Lowe D G. Object recognition from local scale-invariant fea- tures. In: Proceedings of the 7th IEEE International Confer- ence on Computer Vision. Kerkyra: IEEE, 1999. 1150-1157. 被引量:1
  • 5Schuldt C, Laptev I, Caputo B. Recognizing human actions: a local SVM approach. In: Proceedings of the 17th In- ternational Conference on Pattern Recognition. Cambridge: IEEE, 2004. 32-36. 被引量:1
  • 6Dollar P, Rabaud V, Cottrell G, Belongie S. Behavior recog- nition via sparse spatio-temporal features. In: Proceedings of the 2005 IEEE International Workshop on Visual Surveil- lance and Performance Evaluation of Tracking and Surveil- lance. Beijing, China: IEEE, 2005.65-72. 被引量:1
  • 7Rapantzikos K, Avrithis Y, Kollias S. Dense saliency-based spatiotemporal feature points for action recognition. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL: IEEE, 2009. 1454-1461. 被引量:1
  • 8Knopp J, Prasad M, Willems G, Timofte R, Van Gool L. Hough transform and 3D SURF for robust three dimensional classification. In: Proceedings of the llth European Confer- ence on Computer Vision (ECCV 2010). Berlin Heidelberg: Springer. 2010. 589-602. 被引量:1
  • 9Klaser A, Marszaeek M, Schmid C. A spatio-temporal de- scriptor based on 3D-gradients. In: Proceedings of the 19th British Machine Vision Conference. Leeds: BMVA Press, 2008. 99.1-99.10. 被引量:1
  • 10Wang H, Ullah M M, Klaser A, Laptev I, Schmid C. Evalua- tion of local spatio-temporal features for action recognition. In: Proceedings of the 2009 British Machine Vision Confer- ence. London, UK: BMVA Press, 2009. 124.1-124.11. 被引量:1

共引文献161

同被引文献22

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部