摘要
词义归纳是解决词义知识获取的重要研究课题,利用聚类算法对词义进行归纳分析是目前最广泛采用的方法。通过比较K-Means聚类算法和EM聚类算法在各自词义归纳模型上的优势,提出一种新的融合距离度量和高斯混合模型的聚类算法,以期利用两种聚类算法分别在距离度量和数据分布计算上的优势,挖掘数据的几何特性和正态分布信息在词义聚类分析中的作用,从而提高词义归纳模型的性能。实验结果表明,所提混合聚类算法对于改进词义归纳模型的性能是十分有效的。
Word sense induction is an important topic in solving knowledge acquisition of word sense,and the most widely used method to word sense induction is based on cluster analysis algorithm.By comparing K-Means clustering algorithm with EM clustering algorithm on the model of word sense induction,we proposed a new hybrid clustering algorithm by integrating distance metric and Gaussian mixture model,which combine the advantages of distance metric and data distributed computing in the two cluster algorithms respectively to mine the role of geometrical properties and normal distribution information of training data in clustering analysis and then improve the performance of performance of word sense model.Experimental results show that the hybrid clustering algorithm proposed in this paper is very effective to improve the performance of word sense induction model.
作者
张宜浩
刘智
朱常鹏
ZHANG Yi-hao LIU Zhi ZHU Chang-peng(College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, Chin)
出处
《计算机科学》
CSCD
北大核心
2017年第8期265-269,共5页
Computer Science
基金
重庆市教委科学技术研究项目(kj1500920
kj1500916)
国家自然科学基金项目(61603065)资助
关键词
词义归纳
距离度量
高斯混合模型
混合聚类
Word sense induction
Distance metric
Gaussian mixture model
Hybrid clustering