摘要
快速有效地识别视频中的人体动作,具有广泛的应用前景及潜在的经济价值。但目前的视频动作识别方法易受到运动人体晃动、背景变化、摄相机抖动、运动人体阴影等背景因素影响。为解决上述问题,本文提出一种非局域时间段网络方法。该方法在双流网络的基础上,通过加入非局域计算使网络能关注到更大时空范围的信息,并进一步融入光流信息使网络更精确地将注意力放在动作区域,从而增强对视频复杂静态背景的鲁棒性。此外,为了融合双流分段网络的多路判别结果,本文使用可学习的加权平均取代简单平均来融合多模态信息。经过在TDAP数据集上的实验验证,本文的模型可在复杂背景下较为精确地识别出人体动作,与原有模型相比在几乎不增加时间复杂度的前提下提升了识别性能。
Recognizing human actions in videos has broad application prospects and great potential economic value.However,the accuracy of video action recognition is affected by a number of factors such as swaying,background changes,camera shaking and moving shadows.To reduce the influence of such complex background,we proposed non-local temporal segment networks(NLTSNet).The NLTSNet is based on the temporal segment network but is enhanced with non-local modules over the ResNet so as to capture the non-local spatial and temporal information contained in the video clips.To furthermore improve the network’s robustness against stationary cluttered background,we integrate the optical flow into the non-local module.Finally,we adopt a learnable ensemble network to fuse the prediction results from both the appearance and temporal modality.Extensive experimental results on the TDAP dataset show that our new method can recognize human actions with more accuracy in a complex background compared with several state of the art methods,without increasing the time complexity.
作者
潘陈听
谭晓阳
PAN Chen-ting;TAN Xiao-yang(College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing 211106, China;Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 211106, China)
出处
《计算机与现代化》
2020年第7期97-103,共7页
Computer and Modernization
基金
国家自然科学基金资助项目(61976115,61672280,61732006)
南航人工智能+项目(56XZA18009)。
关键词
动作识别
非局域模块
时间段网络
复杂背景
自注意力
action recognition
non-local module
temporal segment network
complex background
self-attention