期刊文献+

大规模句子相似度计算方法 被引量:6

Approach of Large-Scale Sentence Similarity Computation
下载PDF
导出
摘要 如何根据源语言文本从大规模语料库中找出其最相近的翻译实例,即句子相似度计算,是基于实例翻译方法的关键问题之一。本文提出一种多层次句子相似度计算方法:首先基于句子的词表层特征和信息熵从大规模语料库中选择出少量候选实例,然后针对这些候选实例进行泛化匹配,从而计算出相似句子。在多策略机器翻译系统IHSMTS中的实验表明,当语料规模为20万英汉句对时,系统提取相似句子的召回率达96%。准确率达90%,充分说明了本文算法的有效性。 The retrieval of the similar translation examples corresponding to the SL sentence from the large-scale corpora, or the computation of sentence similarity, is one of the key problems of EBMT. A new multi-layer sentence similarity computation approach is proposed in this paper. First, a few candidate translation examples are selected form a large-scale corpus on the basis of the surface features and entropies of the given words. Second, the degree of generalization match between the input sentence and each of those candidate translation examples is computed respectively. Finally, the sentence similarity is computed according to the outcomes of the previous two steps. Experimental results from tests on IHSMTS show that this approach has a recall rate of 96% and a precision rate of 90% when applied to a corpus of 200,000 English-Chinese sentence pairs.
出处 《中文信息学报》 CSCD 北大核心 2006年第B03期47-52,共6页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60502048,60272088) 国家863计划资助项目(2002AA117010-02)
关键词 句子相似度 基于实例的机器翻译 多策略机器翻译 泛化匹配 sentence similarity example-based machine translation hybrid-strategy machine translation generaliza-tion matching
  • 相关文献

参考文献6

  • 1H. Maruyama and H. Watanabe. Tree Cover Search Algorithm for Example-Based Translation [A]. In: Proceeding of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation[C] (TMI-92). Montreal, 1992, 173-184. 被引量:1
  • 2Ralf D. Brown, Example-Based Machine Translation in the Pang, loss System[A]. In: Proceedings of the 16th International Conference on Computational Linguistics [C] (COLING-96).. Copenhagen, Denmark, August 5-9, 1996, 169-174. 被引量:1
  • 3Keiji Yasuda, Fumiali Suagya, etc, An Automatic Evaluation Method of Translation Quality Using Translation Answer Candidates Queried from a Paralledl Corpus [A]. In: Proceeding of MT Summit's conference [C].Santiago de Compostela, 2001. 被引量:1
  • 4Jianmin Yao, Ming Zhou etc, An Automatic Evaluation Method for Localization Oriented Lexicalised EBMT System [A]. In: Proceeding of the 19th International Confernce on Computational Linguistics [C] (COLING2002).Taipei, 2002. 被引量:1
  • 5Yasuhiro Akiba, Kenji Imamura, and Eiichiro Sumita, Using Multiple Edit Distances to Automatically Rank Machine Translation Output [A]. In: Proceeding of MT Summit's conference [C]. Santiago de Compostela, 2001. 被引量:1
  • 6黄河燕 陈肇雄.基于多策略的交互式智能辅助翻译平台总体设计[A].黄河燕主编.机器翻译研究进展[M].北京:电子工业出版社,2002年11月.137-146. 被引量:3

共引文献2

同被引文献79

引证文献6

二级引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部