期刊文献+

基于时空交叉感知的实时动作检测方法

Real-Time Action Detection Based on Spatio-Temporal Interaction Perception
下载PDF
导出
摘要 时空动作检测依赖于视频空间信息与时间信息的学习.目前,最先进的基于卷积神经网络(Convolutionsl Neural Networks,CNN)的动作检测器采用2D CNN或3D CNN架构,取得了显著的效果.然而,由于网络结构的复杂性与时空信息感知的原因,这些方法通常采用非实时、离线的方式.时空动作检测主要的挑战在于设计高效的检测网络架构,并能有效地感知融合时空特征.考虑到上述问题,本文提出了一种基于时空交叉感知的实时动作检测方法.该方法首先通过对输入视频进行乱序重排来增强时序信息,针对仅使用2D或3D骨干网络无法有效对时空特征进行建模,提出了基于时空交叉感知的多分支特征提取网络.针对单一尺度时空特征描述性不足,提出一个多尺度注意力网络来学习长期的时间依赖和空间上下文信息.针对时序和空间两种不同来源特征的融合,提出了一种新的运动显著性增强融合策略,对时空信息进行编码交叉映射,引导时序特征和空间特征之间的融合,突出更具辨别力的时空特征表示.最后,基于帧级检测器结果在线计算动作关联性链接.本文提出的方法在两个时空动作数据集UCF101-24和JHMDB-21上分别达到了84.71%和78.4%的准确率,优于现有最先进的方法,并达到73帧/秒的速度.此外,针对JHMDB-21数据集存在高类间相似性与难样本数据易于混淆等问题,本文提出了基于动作表示的关键帧光流动作检测方法,避免了冗余光流的产生,进一步提升了动作检测准确率. Spatiotemporal action detection requires incorporation of video spatial and temporal information.Current state-of-the-art approaches usually use a 2D CNN(Convolutionsl Neural Networks)or a 3D CNN architecture.However,due to the complexity of network structure and spatiotemporal information extraction,these methods are usually non-realtime and offline.To solve this problem,this paper proposes a real-time action detection method based on spatiotemporal in⁃teraction perception.First of all,the input video is rearranged out of order to enhance the temporal information.As 2D or 3D backbone networks cannot be used to model spatiotemporal features effectively,a multi-branch feature extraction net⁃work is proposed to extract features from different sources.And a multi-scale attention network is proposed to extract longterm time-dependent and spatial context information.Then,for the fusion of temporal and spatial features from two differ⁃ent sources,a new motion saliency enhancement fusion strategy is proposed,which guides the fusion between features by encoding temporal and spatial features to highlight more discriminative spatiotemporal features.Finally,action tube links are generated online based on the frame-level detector results.The proposed method achieves an accuracy of 84.71%and 78.4%on two spatiotemporal motion datasets UCF101-24 and JHMDB-21.And it provides a speed of 73 frames per sec⁃ond,which is superior to the state-of-the-art methods.In addition,for the problems of high inter-class similarity and easy confusion of difficult sample data in the JHMDB-21 dataset,this paper proposes an action detection method of key frame optical flow based on action representation,which avoids the generation of redundant optical flow and further improves the accuracy of action detection.
作者 柯逍 缪欣 郭文忠 KE Xiao;MIAO Xin;GUO Wen-zhong(College of Computer and Data Science,Fuzhou University,Fuzhou,Fujian 350116,China;Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing,Fuzhou University,Fuzhou,Fujian 350116,China;Key Laboratory of Spatial Data Mining&Information Sharing,Ministry of Education,Fuzhou,Fujian 350003,China)
出处 《电子学报》 EI CAS CSCD 北大核心 2024年第2期574-588,共15页 Acta Electronica Sinica
基金 国家自然科学基金(No.61972097,No.U21A20472) 国家重点研发计划(No.2021YFB3600503) 福建省科技重大专项(No.2021HZ022007) 福建省自然科学基金(No.2021J01612,No.2020J01494)。
关键词 实时动作检测 多尺度注意力 时空交叉感知 real-time action detection multiscale attention spatio-temporal interaction perception
  • 相关文献

参考文献5

二级参考文献8

共引文献80

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部