摘要
人脸表情识别一直是计算机视觉领域的一个难题.近年来,随着深度学习的飞速发展,一些基于卷积神经网络的方法大大提高了人脸表情识别的准确率,但未能充分利用人脸图像中的信息,这是由于对于面部表情识别有意义的特征主要集中在一些关键位置,例如眼睛、鼻子和嘴巴等区域,因此在特征提取时增加这些关键位置的权重可以改善表情识别的效果.为此,提出一种基于注意力机制的人脸表情识别网络.首先在主干网络中加入了深浅层特征融合结构,以充分提取原始图像中不同尺度的浅层特征,并将其与深层特征级联,以减少前向传播时的信息丢失.然后在网络中嵌入一种基于两步法的通道注意力模块,对级联后的特征图中的通道信息进行编码,得到通道注意力图,再将其与级联特征图逐元素相乘,得到通道加权特征图,将多尺度特征提取与空间注意力相结合,提出多尺度空间注意力模块,对通道加权特征图的不同位置进行加权,得到空间加权特征图.最后将通道和空间均已加权的特征图输入到后续网络中继续进行特征提取和分类.实验结果表明,所提出的方法与现有的基于深度学习的方法相比,在扩展的Cohn-Kanada数据集上的表情识别准确率提高了0~3%,在OULU-CASIA NIR&VIS数据集上的表情识别准确率提高了1%~8%,证明了该方法的有效性.
Facial expression recognition has remained a challenging problem in computer vision.Recently,with the rapid development of deep learning,some methods based on convolutional neural networks have greatly improved the accuracy of facial expression recognition.However,these methods have not fully used the available information because the meaningful features for facial expression recognition are mainly concentrated in some key locations,such as eyes,nose,and mouth.Increasing the weight of these key positions can improve the effect of facial expression recognition.This paper proposed a facial expression recognition network based on an attention mechanism.First,a deep and shallow feature fusion structure was added to the backbone network.This structure was designed to fully extract the shallow features at various scales from the original image and cascade these features with deep features to reduce information loss during forward propagation.Second,a two-step-based channel attention module was embed-ded in the network to encode the channel information in the cascaded feature map and obtain the channel attention map.Then,this paper proposed a multiscale spatial attention module by combining multiscale feature extraction with spatial attention.Through this module,various positions of the channel-weighted feature map were weighted to obtain the spatial-weighted feature map.Finally,the feature map whose channels and spatial positions were weighted was input into the subsequent network for feature extraction and classification.Experimental results show that this method improves the expression recognition accuracy by 0—3%and 1%—8%on the extended Cohn-Kanada and OULU-CASIA NIR(near infrared)&VIS(visible light)datasets,respectively,which proves the effectiveness of this method.
作者
张为
李璞
Zhang Wei;Li Pu(School of Microelectronics,Tianjin University,Tianjin 300072,China)
出处
《天津大学学报(自然科学与工程技术版)》
EI
CAS
CSCD
北大核心
2022年第7期706-713,共8页
Journal of Tianjin University:Science and Technology
基金
新一代人工智能科技重大专项资助项目(19ZXZNGX00030)
应急管理部消防救援局科研计划重点攻关项目(2019XFGG20).
关键词
人脸表情识别
卷积神经网络
注意力机制
深浅层特征融合
facial expression recognition
convolutional neural network
attention mechanism
deep and shallow feature fusion