摘要
针对词义相似度计算问题,在《同义词词林》的基础上,从语言学角度分析了《词林》中词语间的组织关系,阐述了父结点深度对词义相似度的决定性作用。统计了各层结点及原子词群大小的分布情况。提出了仅使用父结点深度的计算模型和父结点深度与其分支信息相结合的计算模型。运用上述两种方法的词义相似度计算结果与Miller的人工标注值之间的皮尔逊相关系数达到0.854和0.857,根方误差达到1.003和0.991。
To solve the problem of semantic similarity calculation,on the basis of CiLin,it analyzes the organizational relationship between words in CiLin,and analyzes the decisive role of parent node depth in semantic similarity from the perspective of linguistics.The distribution of nodes in each layer and atomic word groups is calculated.The calculation model of parent node depth and the combination of parent node depth and its branch information are proposed.The Pearson correlation coefficients between the semantic similarity calculated by the above two methods and Miller’s manual standard value reach 0.854 and 0.857.The root square error reach 1.003 and 0.991.
作者
杨泉
孙玉泉
YANG Quan;SUN Yuquan(College of Chinese Language and Culture,Beijing Normal University,Beijing 100875,China;School of Mathematical Sciences,Beihang University,Beijing 100191,China)
出处
《计算机工程与应用》
CSCD
北大核心
2020年第17期48-54,共7页
Computer Engineering and Applications
基金
国家语委科研项目(No.YB135-91)。