摘要
基于统计和计算的自动词对齐法优点在于以词的频率与分布情形来猜测词的对应,只需要大量语料库、不需要机读词典或语言知识即可搜寻出句子的对应。这种方法的缺点是准确率受频率、语系、文类、风格等因素影响很大。针对这一不足,提出基于GIZA++的手动汉英词对齐法设想,主要思路是先通过GIZA++工具进行预对齐,在此基础上再进行人工编辑和对齐。实验证明:与单纯的无监督对齐法相比,速度大幅提高;与其他纯自动词对齐法相比,准确率有所提高。
The advantage of automatic word alignment based on statistics and computation lies in getting the equivalent words by the frequency and distribution of words. Meantime,it only needs a large number of corpora,the corresponding sentences can be searched out without machine-readable dictionary or language knowledge. However,the disadvantage of this method is that its accuracy is greatly affected by the frequency,language,genre,style and other factors. In order to resolve this problem,this paper proposes a GIZA + +-based manual Chinese-English word alignment method,which is to align first with the GIZA + + tool,and then manually edit and align it. According to some experiments with this method,it shows that: compared with the unsupervised alignment method,the speed of this method is greatly increased; compared with other automatic word alignment method,its accuracy is improved as well.
出处
《海南广播电视大学学报》
2017年第4期7-11,共5页
Journal of Hainan Radio & TV University
基金
2016年海南省自然科学基金项目"基于多预处理机制的多种重映射融合汉英自动词对齐系统研究-以海南旅游文本汉英翻译网上平行语料库创建为例"(编号:20167238)成果之一
关键词
自动词对齐
GIZA++
手动对齐
automatic word alignment
GIZA + +
manual word alignment