期刊文献+

基于Sinc-Transformer模型的原始语音情感识别 被引量:8

Emotion Recognition from Raw Speech Based on Sinc-Transformer Model
下载PDF
导出
摘要 考虑传统语音情感识别任务中,手动提取声学特征的繁琐性,本文针对原始语音信号提出一种Sinc-Transformer(SincNet Transformer)模型来进行语音情感识别任务。该模型同时具备SincNet层及Transformer模型编码器的优点,利用SincNet滤波器从原始语音波形中捕捉一些重要的窄带情感特征,使其整个网络结构在特征提取过程中具有指导性,从而完成原始语音信号的浅层特征提取工作;利用两层Transformer模型编码器进行二次处理,以提取包含全局上下文信息的深层特征向量。在交互式情感二元动作捕捉数据库(IEMOCAP)的四类情感分类中,实验结果表明本文提出的Sinc-Transformer模型准确率与非加权平均召回率分别为64.14%和65.28%。同时与基线模型进行对比,所提模型能有效地提高语音情感识别性能。 Considering the complexity of manual extraction of acoustic features in traditional speech emotion recognition tasks,this paper proposed the Sinc-Transformer(SincNet Transformer)model for speech emotion recognition using raw speech.This model combined the advantages of SincNet and Transformer model encoder,and used SincNet filter to capture important narrow-band emotional features from the raw speech waveform,so that the whole network structure could be instructive in the process of feature extraction,so as to completed the shallow feature extraction work of raw speech signals;and used two layers of Transformer model encoders for secondary processing to extract deeper feature vectors that contain global context information.Among the four categories of speech emotion recognition in IEMOCAP database,experimental results show that the accuracy and unweighted average recall of Sinc-Transformer model proposed in this paper are 64.14%and 65.28%respectively.Meanwhile,compared with the baseline model,the proposed model can effectively improve speech emotion recognition performance.
作者 俞佳佳 金赟 马勇 姜芳艽 戴妍妍 YU Jiajia;JIN Yun;MA Yong;JIANG Fangjiao;DAI Yanyan(School of Physics and Electronic Engineering,Jiangsu Normal University,Xuzhou,Jiangsu 221116,China;Kewen College,Jiangsu Normal University,Xuzhou,Jiangsu 221116,China;School of Linguistic Sciences and Arts,Jiangsu Normal University,Xuzhou,Jiangsu 221116,China)
出处 《信号处理》 CSCD 北大核心 2021年第10期1880-1888,共9页 Journal of Signal Processing
基金 国家自然科学基金青年项目(52005267) 江苏省高校自然科学基金(18KJB510013,17KJB510018)。
关键词 语音情感 Transformer模型编码器 SincNet滤波器 原始语音 speech emotion Transformer model encoder SincNet filter raw speech
  • 相关文献

参考文献4

二级参考文献88

  • 1van Bezooijen R,Otto SA,Heenan TA. Recognition of vocal expressions of emotion:A three-nation study to identify universal characteristics[J].{H}JOURNAL OF CROSS-CULTURAL PSYCHOLOGY,1983,(04):387-406. 被引量:1
  • 2Tolkmitt FJ,Scherer KR. Effect of experimentally induced stress on vocal parameters[J].Journal of Experimental Psychology Human Perception Performance,1986,(03):302-313. 被引量:1
  • 3Cahn JE. The generation of affect in synthesized speech[J].Journal of the American Voice Input/Output Society,1990.1-19. 被引量:1
  • 4Moriyama T,Ozawa S. Emotion recognition and synthesis system on speech[A].Florence:IEEE Computer Society,1999.840-844. 被引量:1
  • 5Cowie R,Douglas-Cowie E,Savvidou S,McMahon E,Sawey M,Schro. Feeltrace:An instrument for recording perceived emotion in real time[A].Belfast:ISCA,2000.19-24. 被引量:1
  • 6Grimm M,Kroschel K. Evaluation of natural emotions using self assessment manikins[A].Cancun,2005.381-385. 被引量:1
  • 7Grimm M,Kroschel K,Narayanan S. Support vector regression for automatic recognition of spontaneous emotions in speech[A].IEEE Computer Society,2007.1085-1088. 被引量:1
  • 8Eyben F,Wollmer M,Graves A,Schuller B Douglas-Cowie E Cowie R. On-Line emotion recognition in a 3-D activation-valencetime continuum using acoustic and linguistic cues[J].Journal on Multimodal User Interfaces,2010,(1-2):7-19. 被引量:1
  • 9Giannakopoulos T,Pikrakis A,Theodoridis S. A dimensional approach to emotion recognition of speech from movies[A].Taibe:IEEE Computer Society,2009.65-68. 被引量:1
  • 10Wu DR,Parsons TD,Mower E,Narayanan S. Speech emotion estimation in 3d space[A].Singapore:IEEE Computer Society,2010.737-742. 被引量:1

共引文献194

同被引文献83

引证文献8

二级引证文献50

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部