摘要
目前基于3D-ConvNet的行为识别算法普遍使用全局平均池化(global average pooling,GAP)压缩特征信息,但会产生信息损失、信息冗余和网络过拟合等问题。为了解决上述问题,更好地保留卷积层提取到的高级语义信息,提出了基于全局频域池化(global frequency domain pooling,GFDP)的行为识别算法。首先,根据离散余弦变换(discrete cosine transform,DCT)看出,GAP是频域中特征分解的一种特例,从而引入更多频率分量增加特征通道间的特异性,减少信息压缩后的信息冗余;其次,为了更好地抑制过拟合问题,引入卷积层的批标准化策略,并将其拓展在以ERB(efficient residual block)-Res3D为骨架的行为识别模型的全连接层以优化数据分布;最后,将该方法在UCF101数据集上进行验证。结果表明,模型计算量为3.5 GFlops,参数量为7.4 M,最终的识别准确率在ERB-Res3D模型的基础上提升了3.9%,在原始Res3D模型基础上提升了17.4%,高效实现了更加准确的行为识别结果。
The current 3D-ConvNet-based action recognition algorithms generally use GAP to compress feature information.However,it leads to issues of information loss,redundancy,and network overfitting.To address these issues and enhance the retention of high-level semantic information extracted by the convolutional layer,this paper proposed an action recognition algorithm based on GFDP.Firstly,DCT shows that GAP is a special case of feature decomposition in the frequency domain.Therefore,the algorithm introduced more frequency components to increase the specificity between feature channels and reduce the information redundancy after information compression.Secondly,to better suppress the overfitting problem,the algorithm introduced the batch normalization strategy to the convolutional layer and extended it to the fully connected layer of the action recognition model with ERB-Res3D as the skeleton to optimize the data distribution.Finally,this paper verified the proposed method on the UCF101 dataset.The results reveals that the model’s computational load is 3.5 GFlops,with 7.4 million para-meters.The final recognition accuracy improved by 3.9%based on the ERB-Res3D model and 17.4%based on the original Res3D model.This improvement effectively achieves more accurate behavior recognition results.
作者
贾志超
张海超
张闯
颜蒙蒙
储金祺
颜之岳
Jia Zhichao;Zhang Haichao;Zhang Chuang;Yan Mengmeng;Chu Jinqi;Yan Zhiyue(College of Electronic&Information Engineering,Nanjing University of Information Science&Technology,Nanjing 210044,China;Jiangsu Key Laboratory of Meteorological Observation&Information Processing,Nanjing 210044,China)
出处
《计算机应用研究》
CSCD
北大核心
2024年第9期2867-2873,共7页
Application Research of Computers
基金
国家自然科学基金资助项目(62272234)。