摘要
针对藏文虚词的文法特点,设计了基于藏文虚词知识融合的方法,该方法能够提高藏汉翻译的效果。首先通过全部藏文虚词知识融合、过滤兼类虚词知识融合、单音节虚词知识融合和多音节虚词知识融合,得到四种对应语料,其次将其在Transformer模型和mBART模型上进行了实验,使用轮数集成和不同网络结构集成来提高最终模型的泛化能力。对比实验证明,藏文虚词知识融合算法与模型集成策略可以提升藏汉机器翻译的翻译效果,最高可以达到38.05个BLEU。
This paper designs a method based on the knowledge fusion of Tibetan function words for the grammatical characteristics of Tibetan function words,which can improve the effectiveness of Tibetan-Chinese translation.Firstly,four corresponding corpora are obtained by all Tibetan function word knowledge fusion filtering-cum-class function word knowledge fusion,monosyllabic function word knowledge fusion and multisyllabic function word knowledge fusion.Secondly,they are experimented on Transformer model and mBART model,and the number of rounds integration and different network structure integration are used to improve the generalization ability of the final model.The comparative experiments demonstrate that the Tibetan function word knowledge fusion algorithm and model integration strategy can improve the translation of Tibetan-Chinese machine translation up to 38.05 BLEU.
作者
严松思
珠杰
汪超
刘亚姗
许泽洲
徐泽辉
YAN Songsi;ZHU Jie;WANG Chao;LIU Yashan;XU Zezhou;XU Zehui(School of Information Science and Technology,Tibet University,Lhasa 540000,China;Provincial and Ministerial Collaborative Innovation Centre for Informatization in Tibet,Lhasa 540000,China)
出处
《中央民族大学学报(自然科学版)》
2024年第1期20-27,共8页
Journal of Minzu University of China(Natural Sciences Edition)
基金
国家自然基金项目(62066042)
教育部人文社会科学研究项目(21YJCZH059)
2021年西藏自治区高校人文社会科学研究项目(SK2021-24)
西藏大学提升计划项目(ZDTSJH21-07)
西藏大学培育计划项目(ZDCZJH21-10)
西藏大学珠峰学科建设计划项目(zf22002001)
西藏大学2020级高水平项目(2020-GSP-S176)。
关键词
藏文虚词知识融合
机器翻译
模型集成
knowledge fusion of Tibetan function words
machine translation
model integration