摘要
词语语义相似度计算在很多自然语言处理相关领域都有着广泛应用。基于知网的现有词语语义相似度计算方法未深入考虑同棵义原层次树的义原距离、义原深度、义原密度及主次关系的影响,致使相似度计算结果并不够精确。针对该问题,提出一种词语语义相似度改进算法,通过分析知网中的义项表达式和义原层次树,用集合的加权平均值代替了义项相似度最大值,在新的边权重函数中引入义原密度,进而通过调节权重因子限制义原深度与义原密度对相似度计算准确度的影响。实验结果表明,改进后的算法有效提高了词语语义相似度准确率,取得了比较满意的结果,比现有方法更合理。
Semantic similarity of words has been widely used in many fields related to NLP. Distance,depth,density of sememes on the same semantic hierarchy tree and the primary and secondary relationship between them,which are not considered deeply in existing algorithms of word semantic similarity on HowNet,so the results of similarity calculation are inaccurate enough.To solve the problem,the paper proposes improved algorithm of word semantic similarity based on HowNet,by analyzing the semiotic expression and semantic hierarchy tree in HowNet,weighted average of set is used to replace maximum sememe similarity,density of sememe is introduced into the new edge weight function,and the influence of sememe depth and sememe density on sememe similarity is restricted by the weight factor. Experimental results show that the accuracy of word semantic similarity is effectively improved,which is more reasonable than existing methods.
作者
王辉
Mariu.sPetrescu
潘俊辉
王浩畅
张强
WANG Hui;Marius.Petrescu;PAN Junhui;WANG Haochang;ZHANG Qiang(Department of Computer and Information Technology,Northeast Petroleum University,Daqing 163318;Petroleum-Gas University of Ploiesti,Ploiesti 100680)
出处
《计算机与数字工程》
2022年第2期225-228,293,共5页
Computer & Digital Engineering
基金
国家自然科学基金项目(编号:61402099,61702093)
黑龙江省自然科学基金项目(编号:2018003)
东北石油大学引导性创新基金(编号:2020YDL-18)资助。
关键词
知网
词语语义相似度
义原密度
义原深度
HowNet
semantic similarity of words
density of sememe
depth of sememe