期刊文献+

基于信息熵的新的词语相似度算法研究 被引量:3

Research of a New Algorithm of Words Similarity Based on Information Entropy
下载PDF
导出
摘要 针对词语相似度计算中结果合理性的问题,文中基于对"知网"中词语、义项和义原三个层次概念的研究,提出一种结合信息论研究中熵的概念的新的词语相似度方法。首先是引入词表相似度计算对词语集进行合理选取,再根据义原信息熵对各义原进行权重上的平衡,抑制一些常见义原在词语的义原集中比重过大而导致计算结果与真实情况相比出现明显误差的情况。实验结果表明,与传统方法相比,文中方法在实验并未出现1.000这样过于绝对的结果,提高了结果的合理性;并且实验词语集而非两词语之间,说明比较的效率也得到了提高。 The words similarity computation is widely used in the area of natural language processing. In this paper,based on the research of words,concepts and sememe in HowNet,a new algorithm of word similarity based on information entropy is proposed. Firstly,similari-ty of words surface is led in this paper for selecting words from words set reasonably. Secondly,weight of each sememe would be bal-anced on the basis of information entropy to inhibition that common sememe would be much more than others in the sememe set what would result in obvious error comparing with physical truth. Experimental results show that compared with traditional methods,the unrea-sonable result like 1. 000 is no-show,which means that the result is rational. In addition,this experiment is based on words set instead of two words,which means that the method is more efficient.
出处 《计算机技术与发展》 2015年第9期119-122,共4页 Computer Technology and Development
基金 安徽省高校自然科学研究重点项目(KJ2013Z023 KJ2013A058) 安徽省振兴计划资助项目(2013ZDJY073)
关键词 词语相似度 知网 义原 信息熵 词表相似度 word similarity HowNet sememe information entropy similarity of words surface
  • 相关文献

参考文献14

二级参考文献54

共引文献593

同被引文献18

引证文献3

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部