摘要
本文首先讨论了汉语方言辨识的依据及特征选取的基本原则,并由此导出了区间差分倒谱特征。然后利用GMM符号发生器和N元语言模型及ANN建立了一个方言辨识系统,该系统与传统的语种识别系统相比,具有以下特点:第一,系统不需要标注好的语音库,从而降低了汉语方言语音库建设的劳动强度和要求;第二,GMM符号化器计算量远远低于音素辨识器,从而提高了方言辨识速度,便于今后实时处理。第三,具有更高的辨识效果和更好的容错性。汉语普通话和三种方言辨识实验结果表明,系统平均辨识率可以达到83.8%。
This paper discusses the criterions for distinguishing different Chinese dialects and the basic features selection firstly. According to these principals, a novel feature named district differential cepstral feature was proposed. Then, a novel dialect identification system combining GMM tokenizer, N-gram language model and ANN is constructed. Compared with traditional LID system, the new system has following characteristics: first, it is unnecessary to use tagged dialects speech database ,which becomes less labour-intensive to build corpora. Second, GMM tokenizer is more computationally efficient. Third, the system has more accurate and robust performance. In a test under Chinese dialects classification, averagely 83.8% accuracy is achieved.
出处
《中文信息学报》
CSCD
北大核心
2006年第5期77-82,共6页
Journal of Chinese Information Processing
基金
江苏省"十五"社科基金资助项目(K3-013)
江苏省高校自然科学基金资助项目(99KJB510002)
关键词
计算机应用
中文信息处理
GMM符号化器
N元语言模型
汉语方言辨识
computer application
Chinese information processing
GMM tokenizer
n-gram language modeling
Chinese dialects identification