期刊文献+

基于维基百科的汉越词语相似度计算 被引量:1

Chinese-Vietnamese word similarity computation based on Wikipedia
下载PDF
导出
摘要 为了解决跨语言汉越词语相似度计算问题,以维基百科多语言概念页面作为桥梁,利用概念之间存在的翻译对应关系、词语出现在不同概念页面及与其他概念之间存在共现关系,提出了基于维基百科的汉越词语相似度计算方法,该方法首先提取维基百科中汉语越南语具有对应关系的概念集合,构建双语概念特征空间,然后根据词语在相应概念描述文本中出现的词频特征,以及词语与概念在其他概念文本中的共现特征构建词语的概念向量值,最后通过夹角余弦对两个向量进行词语相似度计算。实验结果表明提出的方法在汉越双语词语相似度计算上表现了好的效果,概念共现关系能够提高词语相似度的准确率。 In order to solve the word similarity between language concept description page from Wikipedia as Chinese and Vietnamese, setting the multi- a bridge, using translation correspondence between concepts, words appearing in different concept pages, and the co-occurrence relationship between words and other concepts, the method of calculating the similarity between Chinese- Vietnamese words based on Wikipedia is proposed. The set of Chinese-Vietnamese correspondence concept is extracted from Wikipedia to construct bilingual concept feature space. According to the word frequency features appearing in the corresponding concept text, and the co-occurrence features of words and concepts in other concept texts, we construct the concept vector value of words. The similarity between two vectors is calculated by the angle cosine. The experimental results indicate that the proposed method has good effect on the similarity computation between Chinese and Vietnamese words, and the concept co-occurrence relationship can improve the accuracy of word similarity.
出处 《南京理工大学学报》 EI CAS CSCD 北大核心 2016年第4期461-466,共6页 Journal of Nanjing University of Science and Technology
基金 国家自然科学基金(61175068 61472168) 云南省自然科学重点项目(2013FA030)
关键词 汉语 越南语 词语相似度 维基百科 概念 共现关系 对应关系 词频 Chinese Vietnamese word similarity wikipedia concept co-occurrence relationship corresponding relation word frequency
  • 相关文献

参考文献6

二级参考文献48

共引文献242

同被引文献2

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部