摘要
传统的聚类算法一般使用欧氏距离获得数据的相似矩阵,在处理一些较复杂的数据时,欧氏距离由于不能反映全局一致性,因此无法有效地描述出数据点的实际分布。提出了一种基于秩约束密度敏感距离(Rank Constraints Density Sensitive Distance,RCDSD)的自适应聚类算法。该方法首先引入密度敏感距离的相似性度量得到相似矩阵,有效地扩大了不同类数据点之间的距离,缩小了同类数据点间的距离,从而解决了传统聚类算法使用欧氏距离作为相似性度量导致聚类结果出现偏差的弊端;其次,在相似矩阵的拉普拉斯矩阵上施加秩约束,使相似矩阵的连通区域数等于聚类数,直接将数据点划分到正确的类中,得到最终的聚类结果,而不需要执行k-means或其它离散化程序。在人工仿真数据集和真实数据集上进行了大量实验,结果表明,所提算法得到了准确的聚类结果,并提高了聚类性能。
The traditional clustering algorithms generally use Euclidean distance to acquire the similar matrix. In some more complex data processing, Euclidean distance doesn't have the ability of describing the characters of data because it can't reflect the global consistency. An adaptive clustering algorithm based on rank constraint density sensitive distance (RCDSD) was proposed in this paper. First, a density sensitive distance similarity measure is introduced to acquire the similar matrix which enlarges the distance between the different classes and reduces the distance between the same clas- ses effectively,so as to solve the disadvantages of clustering results deviation of the traditional clustering algorithm based on Euclidean distance. Second, the rank constraint is imposed to the Laplacian matrix of the similarity matrix, thus the number of connected area of the similar matrix is equal to the number of clustering,and the data can be directly di- vided into the right class and the algorithm can take the final clustering result, while the algorithm does not need to per- form k-means or other discrete procedure. Experimental results show that the approach can obtain accurate clustering results and improve the clustering performance on both artificial simulation data sets and real data sets.
出处
《计算机科学》
CSCD
北大核心
2017年第5期276-279,284,共5页
Computer Science
基金
国家自然科学基金项目(F020806)
辽宁省高等学校优秀人才支持计划项目(LR2015033)
辽宁省科技计划项目(2013405003)
大连市科技计划项目(2013A16GX116)资助
关键词
密度敏感
相似矩阵
秩约束
聚类
Density sensitive, Similarity matrix, Rank constraints, Clustering