摘要
[目的]迁移学习是提升低资源神经机器翻译性能的有效方法,然而现有迁移学习方法在泰语到老挝语迁移学习上表现不佳,主要问题在于泰语和老挝语的书写体系不同,难以建立准确的迁移词表映射.[方法]因此,本文提出基于编码转写增强词嵌入迁移的老-中神经机器翻译方法,利用泰老发音相似性构建统一罗马化转写规则,对泰语和老挝语进行编码转写,建立准确词表映射关系,进而实现泰语到老挝语的增强词嵌入迁移.[结果]实验结果表明,本文所提方法在老-中和老-英两个翻译方向上相比基线模型提升2.45和2.74个BLEU值.[结论]本文方法在低资源语言间迁移学习中表现良好.
[Objective] As an effective method,the transfer learning improves the performance of low-resource neural machine translation.However,existing transfer learning methods do not perform satisfactorily in Thai to Lao language transfer learning.The main problem lies in that these writing systems of Thai and Lao languages differ from each other,thus leading to the difficulty of establishing accurate transfer vocabulary mappings.[Methods] In this article,we propose a Lao-Chinese neural machine translation method based on encoding and transcribing to enhance word embedding transfer.This method leverages language similarity to establish accurate word list mappings and achieve high-quality model transfer.First,we explore the phonetic similarities between Thai and Lao and employ these similarities to develop a unified romanization transcription rule.Subsequently,a Lao-Chinese neural machine translation framework is constructed to encode and transcribe enhanced word embeddings and facilitate transfer between Thai and Lao languages,thereby establishing accurate word list mapping relationships and achieving improved word embedding transfer from Thai to Lao.[Results] Compared to the baseline model,experimental results indicate that the proposed method achieves BLEU score improvements of 2.45 and 2.74 in Lao-Chinese and Lao-English translation directions.[Conclusion] The proposed romanization-enhanced word embedding transfer method performs satisfactorily in low-resource language transfer learning.Hopefully,it can provide an effective solution for Lao-Chinese neural machine translation for the related language community.
作者
唐聪
毛存礼
高盛祥
张思琦
王振晗
TANG Cong;MAO Cunli;GAO Shengxiang;ZHANG Siqi;WANG Zhenhan(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China)
出处
《厦门大学学报(自然科学版)》
CAS
CSCD
北大核心
2024年第6期1016-1023,共8页
Journal of Xiamen University:Natural Science
基金
国家自然科学基金(62166023,U21B2027,61972186)
云南省科技重大专项(202103AA080015,202203AA080004,202302AD080003)
云南省基础研究计划项目(202301AT070471)。
关键词
迁移学习
泰语
老挝语
罗马化
机器翻译
transfer learning
Thai
Lao
romanization
machine translation