摘要
视频行为识别近年来逐渐成为计算机视觉领域学者的研究热点,按照识别对象进行划分,视频行为识别任务可分为个体行为识别与群体行为识别.本文聚焦于群体行为识别,识别与分析视频场景中整体人群的行为.已有的群体行为识别方法大多采用多层时序网络模型,学习得到表征时序变化的个体行为特征并对其进行聚合形成群体行为特征.但是,在个体特征聚合过程中,以往方法未能有效考虑个体对群体行为贡献程度的差异性,影响识别性能.为此,本文提出一种针对个体行为特征聚合的注意力池化机制,并依此建立了新型群体行为识别模型,以自底向上的方式同时实现个体行为与群体行为分层识别.首先利用卷积神经网络提取视频中人体图像区块的个体静态特征,并将其作为多层递归神经网络时序模型的输入,从而得到个体动态特征.随后通过注意力池化机制对个体特征完成聚合,得到相应的群体行为特征;最后依托个体、群体行为特征同时完成个体行为与群体行为的识别.未验证所提方法的有效性,本文依托广泛使用的The Volleyball Dataset数据集上开展了一系列实验验证.结果显示,本文所提出的模型取得了较好的分类准确率,分类性能优于当前先进模型.
In group activity recognition,the hierarchical framework is widely used to represent the relationships between individuals and their corresponding groups and has achieved promising performance.However,existing methods simply employ the max/average pooling in this framework,overlooking the distinct contributions of different individuals to the group activity recognition.In this paper,we propose a new contextual pooling scheme,named attentive pooling,which enables weighted information transition from individual actions to group activity.Using the attention mechanism,attentive pooling is intrinsically interpretable and can embed the member context in the existing hierarchical model.To verify the effectiveness of the proposed scheme,two specific attentive pooling methods,i.e.,global attentive pooling(GAP)and hierarchical attentive pooling(HAP),are designed.GAP rewards individuals significant to the group activity,while HAP further considers the hierarchical division by introducing the subgroup structure.Experimental results on the benchmark dataset demonstrate that the proposed scheme is significantly superior over the baseline and comparable to state-of-the-art methods.
作者
李定
张文生
Ding LI;Wensheng ZHANG(Institute of Automation,Chinese Academy of Sciences,Beijing 100091,China;School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100049,China)
出处
《中国科学:信息科学》
CSCD
北大核心
2021年第3期399-412,共14页
Scientia Sinica(Informationis)
基金
科技创新2030—“新一代人工智能”重大项目(批准号:2018AAA0102100)资助。
关键词
群体行为识别
表示学习
注意力机制
深度学习
group activity recognition
representation learning
attention mechanism
deep learning