期刊文献+

面向维汉机器翻译的语料筛选技术研究 被引量:2

Research in corpus filtering technique for Uyghur-Chinese machine translation
下载PDF
导出
摘要 统计机器翻译是目前主流的机器翻译技术,其在维汉翻译中良好的性能已经得到了广泛的认可。维汉统计机器翻译的最终结果同样是受这几方面的影响:翻译模型、语言模型、语料质量和规模等。旨在通过对维汉双语训练语料的筛选来提高最终的机器翻译性能。在相关学者的研究基础上,提出了改进的IBM1模型评价句对齐质量、双语语言模型困惑度进行语料筛选和多种筛选指标综合求交集的方法。这些方法没有语言特性的依赖,支持维汉双语语料的筛选。通过实验可证明,使用笔者提出的方法可以得到更优的维汉机器翻译结果。 Statistical machine translation is the main technique of machine translation at present, its good performance in Uyghur-Chinese machine translation area has been widely accepted. The factors affecting Uyghur-Chinese MT eventually performance still are these : translation model,language model, the quality and scale of corpus and so on. This paper aimed to improve the performance of Uyghur-Chinese SMT by filtering the Uyghur-Chinese training corpus. On the basis of relevant scholars' research, this paper proposed modified IBM1 model to evaluate the quality of sentence alignment,bilingual language model perplexity to filter corpus and getting intersection with multi filtering indexes. These methods were independent on language features, so it supported Uyghur-Chinese corpus filtering well. According to the experimental results,it can achieve better performance in Uyghur-Chinese SMT by the proposed methods.
作者 孔金英 温政阳 杨雅婷 王磊 李晓 Kong Jinying Wen Zhengyang Yang Yating Wang Lei Li Xiao(Xinjiang Technical Institute of Physics & Chemistry, Chinese Academic of Science, Urumqi 830011, China Xinjiang Laboratory of Minority Speech & Language Information Processing, Ururnqi 830011, China University of Chinese Academy of Sciences, Beijing 100049, China Experimental Center for Electronic Data ldentifwation of Urumqi Municipal Public Security Bureau, Urumqi 830000, China Institute of Acoustics of Chinese Academy of Sciences, Bering 100190, China)
出处 《计算机应用研究》 CSCD 北大核心 2016年第12期3654-3657,共4页 Application Research of Computers
基金 中国科学院西部之光项目(XBBS201216 LHXZ201301) 中国科学院先导科技专项项目(XDA06030400) 新疆维吾尔自治区青年自然科学基金资助项目(2015211B034) 新疆维吾尔自治区重点实验室开放课题项目(2015KL031)
关键词 维汉机器翻译 语料筛选 语言模型 Uyghur-Chinese machine translation corpus filtering language model
  • 相关文献

参考文献6

二级参考文献69

  • 1古丽拉.阿东别克,米吉提.阿布力米提.维吾尔语词切分方法初探[J].中文信息学报,2004,18(6):61-65. 被引量:39
  • 2陈毅东,史晓东,周昌乐.平行语料库处理初探:一种排序模型[J].中文信息学报,2006,20(B03):66-70. 被引量:4
  • 3阿依克孜.卡德尔,开沙尔.卡德尔,吐尔根.依布拉音.面向自然语言信息处理的维吾尔语名词形态分析研究[J].中文信息学报,2006,20(3):43-48. 被引量:22
  • 4Philipp Koehn, Franz Josef Och, and Daniel Marcu. Statistical phrase-based translation [ C]//Proc. of HLT-NAACL, 2003. May: 127-133. 被引量:1
  • 5Yajuan Lti, Jin Huang and Qun Liu. Improving Statistical Machine Translation Performance by Training Data Selection and Optimization[C]//Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2007:343-350. 被引量:1
  • 6Matthias Eck, Stephan Vogel, Alex Waibei Low cost portability for statistical machine translation based on n-gram coverage[C]//MT Summit X: 2005:227-234. 被引量:1
  • 7Tong Xiao, Rushan Chen, Tianning Li, Muhua Zhu, Jingbo Zhu, ttuizhen Wang and Feiliang Ren. NEUTrans: a Phrase-Based SMT System for CWMT2009 [C]//5th China workshop on Machine Translation (CWMT), Nanjing, China, 2009: 40-46. 被引量:1
  • 8Deyi Xiong, Qun Liu and Shouxun Lin. Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation [ C]//Proc. of ACL Sydney, 2006 : 521-528. 被引量:1
  • 9Franz Josef Och Hermann Ney. The Alignment Template Approach to Statistical Machine Translation [C ]//Association for Computational Linguistics. 2004. 被引量:1
  • 10Philip Resnik, and Noah A. Smith,The Web as a Parallel Corpus [J]. Computational Linguistics, Sep. 2003,29(3):349-380. 被引量:1

共引文献26

同被引文献4

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部