摘要
过渡段对语音清晰度、可懂度和人耳听觉感知都起到不可忽视的作用。参数语音编码中,包含有过渡段的语音帧能否得到恰当处理,是决定其合成语音是否清晰可懂的关键。本文以混合激励线性预测编码为参考,将其中的语音帧划分为静音、清音、浊音、过渡四大类后分别处理,在以往低码率语音编码(<1 kbps)工作基础上,比较了八种过渡帧划分方法对合成语音PESQ MOS的影响。经分析后发现:不同的过渡帧对PESQ MOS的贡献也不同。由清、静音向浊音变化的过渡帧的贡献最大;介于浊辅音与元音之间的过渡帧的贡献也不应被忽略。
Transition segments play an essential role in clarity, intelligibility, and auditory perception of speech. In parametric speech codec algorithm, whether the synthesized speech is clear and intelligible is critically determined by whether transition frames, which contain the transition segments, can be processed felicitously. Referring to the MELP (Mixed excitation linear prediction), frames are classified into four types: silent, unvoiced, voiced, and transition. Each type is processed respectively. Based on the previous work of low bit rate (〈 1 kbps) speech coding, the effect of 8 transition frame classification methods on the PESQ MOS (Perceptual Evaluation of Speech Quality Mean Opinion Score) are studied. It is found that different transitions contribute differently to the PESQ MOS. The transition from unvoiced or silent frame to voiced frame is the most important. And the transition between voiced consonant and vowel can not be neglected either.
出处
《应用声学》
CSCD
北大核心
2016年第1期77-83,共7页
Journal of Applied Acoustics
基金
国家自然科学基金项目(61302109)
关键词
低码率语音编码
混合激励线性预测编码
过渡段
Low bit rate speech codec, Mixed excitation linear prediction, Transition segment