摘要
邻近传播(Affinity Propagation,AP)聚类将数据集中所有数据点均视为潜在的聚类中心,并采用欧氏距离法计算输入相似度矩阵,导致其性能对变形十分敏感。针对这一缺陷,提出了采用两种不同的相似性度量方法来计算数据集中两个数据点之间的相似度。分别将明可夫斯基(Minkowski)和切比雪夫(Chebychev)相似性度量引入到AP聚类中,替换原有的欧氏距离度量来构建相似性矩阵。在UCI机器学习数据集上,利用Jaccard指数和Fowlkes-Mlowers对提出方法进行了量化评估。实验结果表明,基于明可夫斯基距离和切比雪夫距离的AP聚类方法在总体精度上优于现有的欧氏距离。
Affinity propagation(AP)clustering treats all data points in the dataset as potential cluster centers,and uses the Euclidean distance method to calculate the input similarity matrix,which results in its performance being very sensitive to deformation.In view of this defect,two different similarity measurement methods are proposed to calculate the similarity between two data points in the data set.Minkowski and Chebychev similarity measures are introduced into the AP cluster,respectively,and the original Euclidean distance measure is replaced to construct the similarity matrix.On the UCI machine learning data set,the proposed method is quantitatively evaluated using Jaccard index and Fowlkes-Mlowers.The experimental results show that the AP clustering method based on Minkowski distance and Chebyshev distance has better overall accuracy than the existing Euclidean distance.
作者
温爱红
徐草草
WEN Aihong;XU Caocao(Engineering and Technical College, Chengdu University of Technology, Leshan 614007, China)
出处
《微型电脑应用》
2020年第9期173-176,共4页
Microcomputer Applications
关键词
数据聚类
邻近传播算法
欧氏距离
相似性度量
聚类中心
data clustering
proximity propagation algorithm
Euclidean distance
similarity measure
cluster center