摘要
提出了一种新的汉语韵律词预测方法.利用标注过的语料,分析了语法词与韵律词之间的关系,发现24%的韵律词由不同语法词组合而成,语法词的词长是确定韵律词边界的主要特征.基于以上分析,实现了一种基于错误驱动的规则学习算法(TBL)的韵律词预测方法.实验结果表明,所提出的方法在测试集上能够达到97.5%的预测精度.
A novel approach for predicting chinese prosodic word is introduced. By analyzing a manual tagged corpus, the relationship between lexical word and prosodic word are found. The analysis results show that 24% prosodic words consist of two or more lexical words, and the length of lexical word is a most important feature for predicting prosodic words. A transformation-based error-driven learning algorithm is proposed to predicting prosodic word with lexical features. Experiments demonstrat that the proposed approach outperform other methods with over 97.5% predicting precision.
出处
《西北师范大学学报(自然科学版)》
CAS
2008年第1期47-51,共5页
Journal of Northwest Normal University(Natural Science)
基金
西北师范大学科研骨干培育项目(NWNU-KJCXGC-03-42)
关键词
韵律词
语法词
TBL算法
文语转换
prosodic word
lexical word
transformation-based error-driven learning
text to speech