摘要
该文提出一种层次短语模型过滤和优化方法.该方法在采用传统方法训练得到层次短语规则的基础上,通过强制对齐同时构建源语言和目标语言的解析树,从中过滤并抽取对齐的层次短语规则,最后利用这些规则重新估计翻译模型的翻译概率.该方法不需要引入任何语言学知识,适合大规模语料训练模型.在大规模中英翻译评测任务中,采用该方法训练的模型与传统层次短语模型相比,不仅能够过滤50%左右规则,同时获得0.8~1.2BLEU值的提高.
This paper proposes an effective method for filtering and optimizing hierarchical phrase-based (HPB) model. After obtaining the original HPB rules with traditional training method, we generate the bilingual derivation trees that represent source and target sentences with forced alignment, and then extract the HPB rules from derivation trees. At last, we re-estimated the probabilities of HPB rules with the extracted rules. This method does not need any linguistic knowledge, and it is suitable for large-scale training corpus. In the large scale Chinese-English translation tasks, our proposed method filters about 50 % of the original HPB rules and improves the translation per- formance ranging from 0.8- 1.2 BLEU on the test sets, comparing to the traditional training method.
出处
《中文信息学报》
CSCD
北大核心
2013年第6期134-138,150,共6页
Journal of Chinese Information Processing
基金
国家高技术研究发展计划(863)资助项目(2011AA01A207)
关键词
统计机器翻译
层次短语
强制对齐
模型训练
statistical machine translation hierarchical phrase-based model
forced alignment model training