摘要
聚类作为机器学习中一种重要的无监督学习方式,在图像处理及生物基因分类上具有广泛的应用。快速密度峰搜索与聚类算法(DPC)提出通过寻找密度峰对数据进行分类,它既不需要迭代过程,也不需要人工输入太多参数。但在球形数据集上,DPC算法聚类效果不好,容易忽略潜在的聚类中心,需要人工参与聚类中心选取。针对上述问题,本文采用模糊邻域关系计算数据密度,采用比较距离代替DPC算法中的相对距离。通过对机器学习数据集的实验,将本文提出的算法同DBSCN、OPTICS、DPC在准确率和调整兰德系数上进行比较。实验结果表明本文提出的算法可行有效。
As an important unsupervised learning method in machine learning, clustering has a wide range of applications in image processing and biological gene classification. "Clustering by fast search and find of density peaks"(DPC) proposes to classify data by looking for density peaks, which does not require an iterative process or too many input arguments. However, the DPC algorithm performs poorly on the spherical dataset, and it is easy to ignore the potential cluster center, and needs to manually participate in the cluster center selection. In view of the above problems, this paper uses the fuzzy neighborhood relationship to calculate the data density, and uses the comparative distance instead of the relative distance in the DPC algorithm. Through the experiment of machine learning data set, we compared our algorithm with DBSCAN, OPTICS and DPC in terms of accuracy and ARI. The experimental results show that the proposed algorithm is feasible and effective.
作者
李昕
雷迎科
Li Xin;Lei Yingke(Electronic Countermeasures Institution of National University of Defense Technology,Hefei,Anhui 230037,China)
出处
《信号处理》
CSCD
北大核心
2019年第11期1919-1928,共10页
Journal of Signal Processing
关键词
无监督机器学习
密度峰值聚类算法
模糊聚类算法
比较距离
unsupervised machine learning
density peak clustering algorithm
fuzzy clustering algorithm
comparitive distance