摘要
情感语音中携带着丰富的信息,在人机交互领域有着广阔的应用。Mel频率是基于人耳听觉特性提出来的,它与Hz频率成非线性对应关系。Mel频率倒谱系数(MFCC)则是利用它们之间的这种关系,计算得到的Hz频谱特征,MFCC已经广泛地应用在语音识别领域。由于Mel频率与Hz频率之间非线性的对应关系,使得MFCC随着频率的提高,其计算精度随之下降。因此,在应用中常常只使用低频MFCC,而丢弃中高频MFCC。针对该问题进行了研究,修正了Hz-Mel非线性对应关系,提升了中高频系数的计算精度,并将其作为低频MFCC的补充,应用到语音情感识别中。实验证明,改进之后的算法与经典算法比较,在不同的特征组合上识别率都有不同程度的提高,从而证明了Mid MFCC特征计算方法的有效性。
Emotion speech carries rich information, which is widely used in the human-computer interaction (HCI). Melfrequency is proposed based on the human auditory characteristics, and it is nonlinearly corresponded with Hz-frequency. Mel-frequency cepstral coefficients (MFCC) is one kind of Hz spectral characteristics; MFCC is calculated based on the nonlinear relationship between Mel-frequency and Hz-frequency and has a wide application in the speech recognition area. But because of such nonlinear relationship, the accuracy of MFCC reduces as the frequency increases. Hence, low MFCCs are usually used and high MFCCs are discarded in applications. This paper analyses this problem and proposes an improved algorithm by amending the nonlinear relationship to improve the accuracy of high MFCCs which are the complementary features to low MFCCs for emotion speech recognition. The experiment result proves that the recognition rate of improved algorithm increases compared to the classical algorithm, and the proposed Mid MFCC is effective.
出处
《重庆邮电大学学报(自然科学版)》
2008年第5期597-602,共6页
Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)
基金
新世纪优秀人才支持计划
重庆市自然科学基金(CSTC2007BB2445)
重庆市计算机网络与通信技术重点实验室开放课题基金“情感识别的关键技术研究”
关键词
MFCC
语音情感识别
情感计算
Mel-frequency cepstral coefficients (MFCC)
emotion speech recognition
affective computation