摘要
语音信号和面部表情是人们表达情感的主要途径,也被认为是情感表达的两个主要模态,即听觉模态和视觉模态.目前情感识别的研究方法大多依赖单模态信息,但是单模态情感识别存在信息不全面、容易受噪声干扰等缺点.针对这些问题,提出一种融合听觉模态和视觉模态信息的两模态情感识别方法.首先利用卷积神经网络和预先训练好的面部表情模型,分别从语音信号和视觉信号中提取相应的声音特征和视觉特征;然后将提取的两类特征进行信息融合和压缩,充分挖掘模态间的相关信息;最后,利用长短期记忆循环神经网络对融合后的听觉视觉双模态特征进行情感识别.该方法能够有效地捕捉听觉模态和视觉模态间的内在关联信息,提高情感识别性能.利用RECOLA数据集对提出的方法进行验证,实验结果证明基于双模态的模型识别的效果比单个的图像或声音识别模型更好.
Speech signals and facial expressions are the two main ways when people express their emotions.They are also considered to be the two main modals of emotional expression,i.e.,auditory modality and visual modality.Most of the current methods of emotion recognition research rely on the use of single⁃modal information,but single modal based methods have the disadvantages of incomplete information and vulnerability to noise interference.To address the problems of emotion recognition based on single modal,this paper proposes a bi⁃modal based emotion recognition method that combines auditory modality and visual modal information.Firstly,the Convolutional Neural Network and the pre⁃trained facial expression model are used respectively.The corresponding sound features and visual features are extracted from the speech signal and the visual signal.The extracted two types of features are information fusion and compression,and the relevant information between the modes is fully mined.Finally,the recurrent neural network is used to recognize emotion recognition on the fused auditory visual bimodal features.The method can effectively capture the intrinsic association information between the auditory modality and the visual modality,thereby improve the emotion recognition performance.In this paper,the proposed bimodal identification method is validated by RECOLA dataset.The experimental results show that the model recognition effect based on bimodal is better than a single image or voice recognition model.
作者
范习健
杨绪兵
张礼
业巧林
业宁
Fan Xijian;Yang Xubing;Zhang Li;Ye Qiaolin;Ye Ning(College of Information Science and Technology,Nanjing Forestry University,Nanjing,210037,China)
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2021年第2期309-317,共9页
Journal of Nanjing University(Natural Science)
基金
国家自然科学基金(61902187)
辽宁省自然科学基金(2020⁃KF⁃22⁃04)
南京市留学人员科技创新项目,江苏省双创人才计划。
关键词
情感识别
特征融合
卷积神经网络
长短期记忆
affective recognition
feature fusion
Convolutional Neural Network
Long Shot⁃Term Memory