摘要
目前机器翻译主要对印欧语系进行优化与评测,很少有对中文进行优化的,而且机器翻译领域效果最好的基于注意力机制的神经机器翻译模型—seq2seq模型也没有考虑到不同语言间语法的变换。提出一种优化的英汉翻译模型,使用不同的文本预处理和嵌入层参数初始化方法,并改进seq2seq模型结构,在编码器和解码器之间添加一层用于语法变化的转换层。通过预处理,能缩减翻译模型的参数规模和训练时间20%,且翻译性能提高0.4BLEU。使用转换层的seq2seq模型在翻译性能上提升0.7~1.0BLEU。实验表明,在规模大小不同的语料英汉翻译任务中,该模型与现有的基于注意力机制的seq2seq主流模型相比,训练时长一致,性能提高了1~2BLEU。
Current machine translation systems optimize and evaluate the translation process in Indo-European languages to enhance translation accuracy.But researches about Chinese language are few.At present the seq2seq model is the best method in the field of machine translation,which is a neural machine translation model based on the attention mechanism.However,it does not take into account the grammar transformation between different languages.We propose a new optimized English-Chinese translation model.It uses different methods to preprocess texts and initialize embedding layer parameters.Additionally,to improve the seq2seq model structure,a transform layer between the encoder and the decoder is added to deal with grammar transformation problems.Preprocessing can reduce the parameter size and training time of the translation model by 20%,and the translation performance is increased by 0.4 BLEU.The translation performance of the seq2seq model with a transform layer is improved by 0.7 to 1.0 BLEU.Experiments show that compared to the existing seq2seq mainstream model based on the attention mechanism,the training time for English-Chinese translation tasks is the same for corpus of different sizes,but the translation performance of the proposal is improved by 1 to 2 BLEU.
作者
肖新凤
李石君
余伟
刘杰
刘倍雄
XIAO Xin-feng;LI Shi-jun;YU Wei;LIU Jie;LIU Bei-xiong(Department of Mechanical and Electrical Engineering,Guangdong Polytechnic of Environmental Protection Engineering,Foshan 528216;School of Computer Science,Wuhan University,Wuhan 430079,China)
出处
《计算机工程与科学》
CSCD
北大核心
2019年第7期1257-1265,共9页
Computer Engineering & Science
基金
国家自然科学基金(61502350)
2017广东高校省级重点平台和重大科研项目(2017GKTSCX042)