摘要
译文质量估计是机器翻译领域中一个重要的子任务,该任务旨在不依靠参考译文的情况下对机器译文进行质量分析.当前,译文质量估计任务在汉英、英德机器翻译上有较好的表现,技术相对成熟.但是将模型应用到汉-越神经机器翻译中面临较多问题.尤其是译文质量估计模型在汉越平行数据中提取到的语言特征不能够充分地体现汉语与越南语之间的语言特点,加之汉语与越南语之间语序与句法结构也存在明显的差异.针对上述问题,本文采用统计对齐的方法对汉越之间结构差异进行建模,提取汉语与越南语之间的语言差异化特征,以提升汉越译文质量估计的效果.实验结果表明,融入语言差异化特征在汉-越和越-汉两个方向上较基线模型分别提升了0.52个百分点和0.35个百分点.
Quality estimation is an important sub-task in machine translation,which aims to analyze the quality of machine translations without references.At present quality estimation model is well performed in Chinese-English and English-German machine translation,and technology is relatively mature.However,there are still many problems in applying the quality estimation model to Chinese-Vietnamese neural machine translation.In particular,the linguistic features extracted from the Chinese-Vietnamese parallel data by the translation quality estimation model cannot reflect the linguistic characteristics enough between Chinese and Vietnamese.There are obvious differences in word order and syntactic structure between Chinese and Vietnamese.Focus this problem,this paper uses statistical alignment approach modeling structural differences between Chinese and Vietnamese,extracting differentiation features from Chinese and Vietnamese,in order to improve the performance of Chinese-Vietnamese quality estimation model.The experimental results show that,the integration of linguistic features has increased by 0.52%and 0.35%compared with the baseline model in Chinese-Vietnamese and Vietnamese-Chinese task.
作者
邹翔
朱俊国
高盛祥
余正涛
杨福岸
ZOU Xiang;ZHU Jun-guo;GAO Sheng-xiang;YU Zheng-tao;YANG Fuan(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2022年第7期1413-1418,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61732005,61761026,61672271,61866020)资助
国家重点研发计划项目(2019QY1802,2019QY1801,2019QY1800)资助
云南省重大科技专项计划项目(202002AD080001)资助
云南省人培项目(KKSY201903018)资助.
关键词
质量估计
汉越平行数据
语言特点
差异化特征
汉-越神经机器翻译
quality estimates
Chinese-Vietnamese parallel data
linguistic characteristics
differentiation features
Chinese-Vietnamese neural machine translation