期刊文献+

同源语料增强的低资源神经机器翻译

Cognate-Corpus-Enhanced Low-Resource Neural Machine Translation
下载PDF
导出
摘要 缺少平行句对的低资源机器翻译面临跨语言语义转述科学问题。该文围绕具体的低资源印尼语-汉语机器翻译问题,探索了基于同源语料的数据增广方法,并混合同源语料训练出更优的神经机器翻译模型。这种混合语料模型在印尼语-汉语机器翻译实验中提升了3个多点的BLEU4评分。实验结果证明,同源语料能够有效增强低资源神经机器翻译性能,而这种有效性主要是源于同源语言之间的形态相似性和语义等价性。 Low-resource machine translation is challenged by lacking parallel sentence pairs.We address the specific low-resource machine translation issue from Indonesian to Chinese,and proposes a data augmentation method based on a cognate corpus.Specifically,we optimize the neural machine translation(NMT)model by mixing a cognate corpus,which is mainly derived from the morphological similarity and semantic equivalence between the cognate languages.Experiments demonstrate that the proposed method achieves more than 3 points of the BLEU4 score in the Indonesian-Chinese machine translation.
作者 王琳 刘伍颖 WANG Lin;LIU Wuying(Xianda College of Economics and Humanities,Shanghai International Studies University,Shanghai 200083,China;Shandong Key Laboratory of Language Resources Development and Application,Ludong University,Yantai,Shandong 264025,China)
出处 《中文信息学报》 CSCD 北大核心 2024年第2期54-60,共7页 Journal of Chinese Information Processing
基金 教育部人文社会科学研究青年基金(20YJC740062) 教育部人文社会科学研究规划基金(20YJAZH069) 教育部新文科研究与改革实践项目(2021060049) 上海市哲学社会科学“十三五”规划课题(2019BYY028) 山东省研究生教育改革研究项目(SDYJG21185) 山东省本科教学改革研究重点项目(Z2021323)。
关键词 同源语料 数据增广 低资源机器翻译 印尼语 马来语 cognate corpus data augmentation low-resource machine translation Indonesian Malay
  • 相关文献

参考文献8

二级参考文献88

  • 1黄河燕,张克亮,张孝飞.基于本体的专业机器翻译术语词典研究[J].中文信息学报,2007,21(1):17-22. 被引量:10
  • 2Brown, P.F., J. Cocke, S.A. Della Pietra, V.J. Della Pietra, F. Jelinek, J.D. Lafferty, R.L. Mercer, and P.S. Roossin. 1990. A statistical approach to machine translation. Proceedings of the Workshop on Speech and Natural Language-ACE Pp. 146-51. 被引量:1
  • 3Brown, P.F. , S.A. Della Pietra, V.J. Della Pietra, and R.L. Mercer. 1993. The mathematics of statistical machine translation : Parameter estimation. Computational Linguistics 19,2:263 - 311. 被引量:1
  • 4Carpuat, M. and D.K. WU. 2007. Improving statistical machine translation using word sense disambiguation. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007). Pp. 61-72. 被引量:1
  • 5Chan, Y.S. , H.T. Ng, and D. Chiang. 2007. Word sense disambiguation improves statistical machine translation. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics ( ACL 2007). Pp. 33 - 40. 被引量:1
  • 6Chiang, D. 2005. A hierarchical phrase-based model for statistical machine translation. Proceedings of ACL 2005. Pp. 263 -70. 被引量:1
  • 7Chiang, D. 2007. Hierarchical phrase-based translation. Computational Linguistics 33,2:201-28. 被引量:1
  • 8Galley, M., J. Graehl, K. Knight, D. Marcu, S. DeNeefe, W. Wang, and I. Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of Association for Computational Linguistics (ACL 2006). Pp. 961 -8. 被引量:1
  • 9Koehn, P. 2004. Pharaoh: A beam search decoder for phrase-based statistical machine translation models. Proceedings of the 6th Conference of the Association for Machine Translation in the Americas ( AMTA 2004). Pp. 115 -24. 被引量:1
  • 10Koehn, P., F.J. Och, and D. Marcu. 2003. Statistical phrase-based translation. Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference. Pp. 127 - 33. 被引量:1

共引文献194

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部