摘要
针对汉语语言模型中知识获取不足的问题 ,提出了一种统计与多种形式规则信息结合的机制 ,将规则的表示量化 ,提出语法语义规则矩阵的概念 ,通过扩充词网格、对基于最大可能性的 n元概率值合理调整 ,将短语构成规则、二元语法语义规则、最少分词原则等融入统计模型框架 ,构成多知识源语言模型 ,模型应用于智能拼音汉字转换系统 ,明显提高了音字转换正确率 。
A method of integrating statistical information and different kinds of rules for Chinese language modeling is presented, which represents the rule as figure, introduces the concept of syntactic and semantic rules matrix, and the embeds the phrase rules represented as CFG, the syntactic and semantic rules, and least segmentation principle into the N gram statistical Chinese language model by augmenting the word lattice and adjusting the N gram probabilities based on maximum likelihood. The technique is applied in Chinese Pinyin to character conversion and improves accuracy of the system.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2002年第2期231-235,共5页
Journal of Computer Research and Development
基金
国家自然科学基金项目 (69973 0 15 )
黑龙江省杰出青年基金项目 (F0 2 0 60 4)资助