摘要
无监督学习聚类算法的性能依赖于用户在输入数据集上指定的距离度量,该距离度量直接影响数据样本之间的相似性计算,因此,不同的距离度量往往对数据集的聚类结果具有重要的影响。针对谱聚类算法中距离度量的选取问题,提出一种基于边信息距离度量学习的谱聚类算法。该算法利用数据集本身蕴涵的边信息,即在数据集中抽样产生的若干数据样本之间是否具有相似性的信息,进行距离度量学习,将学习所得的距离度量准则应用于谱聚类算法的相似度计算函数,并据此构造相似度矩阵。通过在UCI标准数据集上的实验进行分析,结果表明,与标准谱聚类算法相比,该算法的预测精度得到明显提高。
The performance of the unsupervised learning clustering algorithm is critically dependent on the distance metric being given by a user over the inputs of the data set. The calculation of the similarity between the data samples lies on the specified metric,therefore,the distance metric has a significant influence to the results of the clustering algorithm.Aiming at the problem of the selection of the distance metric for the spectral clustering algorithm,a spectral clustering algorithm based on distance metric learning with side-information is presented. The algorithm learns a distance metric with the side-information. The similarity between the data samples is chosen randomly from the data set,and is applied to the similarity function of spectral clustering algorithm. It structures the similarity matrix of the algorithm. The effectiveness of the algorithm is verified on real standard data sets on UCI,and experimental results show that compared with the standard spectral clustering algorithms,the prediction accuracy of the proposed algorithm is improved significantly.
出处
《计算机工程》
CAS
CSCD
北大核心
2015年第1期207-210,244,共5页
Computer Engineering
基金
山西省软科学基金资助项目(2009041052-03)
关键词
数据挖掘
边信息
相似度矩阵
距离度量学习
谱聚类
UCI数据集
data mining
side-information
similarity matrix
distance metric learning
spectral clustering
UCI data set