摘要
蒙古文具有典型的构词词缀的特点,一个词往往可以切分成词干和词缀等若干个部分.如果采用通常的N-gram语言模型很难描述词干、词缀等的长距离依赖关系.提出了一种利用长距离依赖的Skip-N语言模型,给出了相隔N个词的二元依赖关系.对这种方法进行了实现,并在一个基于实例的汉蒙机器翻译系统上进行了实验,实验证明Skip-N语言模型能够有效地提高汉蒙机器翻译的效果.
Etymas and suffixes are the typical characters of Mongolian. A Mongolian word could usually be divided into an etyma and several suffixes. It is difficult to describe the dependent relationship of long distance between an etyma and suffixes if using the N-gram language model. Based on long-distance dependence ,a new kind of language model called Skip-N model is presented, and the relation of N-word separated bigram dependence is given. The technique is realized, to applied to a Chinese-Mongolian machine translation system. The experiments prove that the Skip-N language model can improve the translation result.
出处
《内蒙古大学学报(自然科学版)》
CAS
CSCD
北大核心
2008年第2期220-224,共5页
Journal of Inner Mongolia University:Natural Science Edition
基金
内蒙古自然基金项目“蒙古语文本语言模型的构建研究”(200607010805)
国家自然基金项目“基于短语结构转换模板的统计机器翻译方法研究”(60573188)资助
关键词
机器翻译
蒙古语
语言模型
machine translation
Mongolian
language model