期刊文献+

深度学习在统计机器翻译领域自适应中的应用研究 被引量:4

Application of Deep Learning in Statistical Machine Translation Domain Adaptation
下载PDF
导出
摘要 统计机器翻译往往存在待翻译文本来源多样和领域不一致的问题。为了提升面向不同领域的文本的翻译质量,需要根据待翻译文本对训练语料进行筛选以达到领域自适应的目的。目前统计机器翻译的领域自适应方法以目标数据为基准,着重利用统计技术对训练数据或者翻译模型进行领域的适应调整,缺乏明确的领域标签。本研究在本组之前研究基础上利用深度学习中卷积神经网络(Convolutional neural network,CNN)对短文本进行建模,构建合适的网络结构进行有监督学习,获取完整的句子语义信息,按照待翻译文本的领域信息对训练语料进行归类筛选,获取与待翻译文本领域一致的训练数据,并将其应用到统计机器翻译中。本文采用万方英文摘要在统计机器翻译系统上进行测试,仅利用部分训练数据就得到了超越原始训练数据BLEU打分的翻译结果,证明了本研究的有效性和可行性。 Statistical machine translation often meet problems such as the diverse sources of test data and multiple domains. In order to improve the translation quality of texts from different domains, training corpus often needs to be filtered according to target texts to realize domain adaption. The current adaptive methods for statistical machine translation aim to the target texts and focus on the choice of training data and the adjustment of translation models. These approaches have not accuracy and explicit domain label for the texts or data. In this study, we aimed to obtain whole sentence semantic information based on our lab' s pre-research. The short text was modeled by Convolutional Neural Network (CNN), and a suitable network structure was constructed for supervised learning. The training corpus was classified and selected according to the domain information of the test corpus to obtain the part traiining data same domain as test data. We applied this method to SMT system and test this study on the English abstracts of Wanfang data. The results showed that only part of the training data goes beyond the original training data in BLEU score. This indicated that the method is efficient and feasible.
作者 丁亮 姚长青 何彦青 李辉 DING Liang YAO ChangOing HE YanOing LI Hui(Institute of Scientific and Technical Information of China, Beijing 100038, China Key Laboratory of Rich-media Knowledge Organization and Service of Diqital Publishing Content SAPPRFT, Beijing 100038, China Beijing Institute of Science and Technology Information, Beijing 100044, China)
出处 《情报工程》 2017年第3期64-76,共13页 Technology Intelligence Engineering
基金 国家自然科学基金项目(61303152 71503240和71403257) 中国科学技术信息研究所重点工作项目(ZD2017-4)的资助
关键词 统计机器翻译 训练语料选取 卷积神经网络 深度学习 Statistical machine translation, training data selection, convolutional neural network, deep learning
  • 相关文献

参考文献3

二级参考文献40

  • 1陈毅东,史晓东,周昌乐.平行语料库处理初探:一种排序模型[J].中文信息学报,2006,20(B03):66-70. 被引量:4
  • 2中国科学技术情报研究所.汉语主题词表[M].科学技术文献出版社,1991:1-18. 被引量:2
  • 3Philipp Koehn, Franz Josef Och, and Daniel Marcu. Statistical phrase-based translation [ C]//Proc. of HLT-NAACL, 2003. May: 127-133. 被引量:1
  • 4Yajuan Lti, Jin Huang and Qun Liu. Improving Statistical Machine Translation Performance by Training Data Selection and Optimization[C]//Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2007:343-350. 被引量:1
  • 5Matthias Eck, Stephan Vogel, Alex Waibei Low cost portability for statistical machine translation based on n-gram coverage[C]//MT Summit X: 2005:227-234. 被引量:1
  • 6Tong Xiao, Rushan Chen, Tianning Li, Muhua Zhu, Jingbo Zhu, ttuizhen Wang and Feiliang Ren. NEUTrans: a Phrase-Based SMT System for CWMT2009 [C]//5th China workshop on Machine Translation (CWMT), Nanjing, China, 2009: 40-46. 被引量:1
  • 7Deyi Xiong, Qun Liu and Shouxun Lin. Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation [ C]//Proc. of ACL Sydney, 2006 : 521-528. 被引量:1
  • 8Franz Josef Och Hermann Ney. The Alignment Template Approach to Statistical Machine Translation [C ]//Association for Computational Linguistics. 2004. 被引量:1
  • 9Koehn P,Och F J,Marcu D.Statistical Phrase-Based Translation [C]//Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-volume,North American,2003:127-133. 被引量:1
  • 10Eck M,Vogel S,Waibel A.Low cost portability for statistical machine translation based on n-gram coverage [J].Proceedings of Mtsummit X,2005. 被引量:1

共引文献23

同被引文献30

引证文献4

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部