摘要
本文提出了一种基于词汇检索翻译对应句的方法。原文句子与译文句子并不在词汇级存在一一对应的关系,判断是否构成翻译关系也不需要认定所有的词都构成翻译对。本文提出了词语信息度(WI)的概念来反映词在句子中的重要性。词语信息度由词频、词在文档中的分布、词性、词的长度构成。判断是否构成翻译关系时,只关注信息度高的词汇是否构成翻译对。基于高信息度词汇翻译对构建了翻译对应句检索系统。实验表明,系统性能优于简单的基于所有词汇的翻译对应句检索方法,在噪声实验中,与相关研究对比表现了更好的强健性。
This paper proposes a method by which translation sentence pairs can be retrieved based on word-level information.Since there is no one-to-one mapping relationship between words in the original sentences and those in the translated sentences,it is not necessary to assume that all words should be matched when identifying translation sentence pairs.We propose that the concept of Word Information(WI) be adopted to measure the importance of words in a sentence.WI consists of word frequency,word distribution,POS and word length.Only words with a high WI values are considered when identifying translation sentence pairs.We build a translation sentence pairs retrieval system based on word pairs with a high WI value.Experiments show the retrieval system outperforms those based on all words.Even better result is achieved in noisy experiments,which shows this method has better robustness.
出处
《外语教学与研究》
CSSCI
北大核心
2012年第2期270-278,321,共9页
Foreign Language Teaching and Research
基金
国家社科基金项目"中英文跨语言剽窃文本自动识别技术研究"(10CYY024)
国家社科基金重大项目"大规模英汉平行语料库的建立与加工"(10&ZD127)资助