期刊文献+

跨语言文献相似度的分析方法 被引量:2

An analysis method of cross-lingual literature similarity
下载PDF
导出
摘要 对不同语言的句对齐文献资料进行分析,提出了基于多语主题模型的跨语言文献相似度的计算方法.首先,对收集整理的不同语言(中文、英文、韩文)文献构建数据模型,通过分词、分词结果修正及选择、词权重计算等预处理工作构造词项-文档矩阵.其次,建立多语主题语义空间,将译成3种不同语言的文献映射到语义空间,在语义空间中每一主题都由3种语言构成.最后,通过其语义空间中对应的主题计算比较不同语言间的文献相似度.实验结果显示,不同语言之间的文献相似度可以直接在语义空间中计算,且相似度计算的准确性在90%以上,验证了本文方法在跨语言文献相似度计算时的有效性. We analyse different language literatures with sentence alignment and propose a cross lingual literatures' similarity method based on multilingual topic correlation model. In this paper, the data model for the collected different language literatures is firstly gained by term-document matrix, which is obtained by the process of words segmentation, the adjustment and selection of words segmentation results, and the weight calculation of feature words. And then, multilingual topic correlation semantic space is built. The three different language literatures are represented in the semantic space where each topic is made up of the three languages. Similarity calculation of different language literatures is completed by their correlation topic in the semantic space. Experiment results show that the similarity of different language literaturescan be calculated directly in the semantic space, the accuracy can be reached 90 %, which verify the effectiveness of our method in calculating the similarity of cross-lingual literatures.
出处 《延边大学学报(自然科学版)》 CAS 2016年第2期151-155,共5页 Journal of Yanbian University(Natural Science Edition)
基金 吉林省科技发展计划项目(20130101179JC-18) 吉林省公共计算平台资助 延边大学科技发展计划项目(延大科合字[2014]第16号)
关键词 多语主题模型 跨语言 语义相似度 multilingual topic correlation model cross-lingual semantic similarity
  • 相关文献

参考文献1

二级参考文献68

  • 1Oard D W, Dorr B J. A survey of multilingual text retrieval [ EB/OL ]. [ 2014 - 12 - 29 ]. http:// drum. lib. umd. edu/bitstream/1903/807/2/CS-TR-3615, pdf. 被引量:1
  • 2Internet users in the world[ EB/OL]. [ 2014-12-31 ]. http ://www. internetworldstats, com/stats, htm. 被引量:1
  • 3Country and language statistics[ EB/OL]. [ 2014-12-31 ]. http ://www. oclc. org/research/activities/wcp/stats/ intnl, html?urlm = 159859. 被引量:1
  • 4Usage of content languages for websites[ EB/OL]. [ 2014-12-31 ]. http ://w3techs. com/technologies/overview/ content_ language/all. 被引量:1
  • 5Peters C, Braschler M, Clough P. Multilingual information retrieval: from research to practice [ M ]. Berlin: Springer-Verlag, 2012 : 5,58. 被引量:1
  • 6Dragoni M, Franeescomarino C, Ghidini C, et al. Guiding the evolution of a multilingual ontology in a concrete setting[ C ]//The semantic web: semantics and big data. Berlin: Springer-Verlag, 2013: 608-622. 被引量:1
  • 7Salim J, Hashim S, Afis A. A framework for building multilingual ontologies for islamic portal[ C ]// Proceedings of 2010 international symposium on information technology (Vol. 3) . New York: Institute of Electrical and Electronics Engineers, 2010: 1302-1307. 被引量:1
  • 8Trojahn C, Quaresma P, Vieira R. A framework for multilingual ontology mapping[ C ]//Proceedings of the inter- national conference on language resources and evaluation. Paris: The European Language Resources Association, 2008 : 1034-1037. 被引量:1
  • 9Fu B, Brennan R, O'Sullivan D. Cross-lingual ontology mapping-an investigation of the impact of machine trans- lation [ C ]//The Semantic Web. Berlin : Springer-Verlag, 2009 : 1 - 15. 被引量:1
  • 10P~rez A, Suero D, Ponsoda E, et al. Guidelines for multilingual linked data [ EB/OL ]. [ 2015- 03-14 ]. http :// oa. upm. es/29824/1/INVE_MEM 2013_167952. pdf. 被引量:1

共引文献10

同被引文献12

引证文献2

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部