摘要
随着聚类技术的发展,对不同密度的数据集的聚类需求也越来越迫切。为了解决不同密度数据集的聚类问题,提出一种基于距离和密度的多阶段聚类算法MCDD。该算法主要采用多阶段密度处理技术提取不同密度的聚类,同时使用密度因子提高聚类的精度,最后通过使用距离阈值的方法去除孤立点和噪声数据。实验表明,该算法在扩展性方面表现良好,对任意形状和大小的聚类都可以很好地处理,并能够很好地识别出孤立点或噪声,在处理多密度聚类方面有很好的精度。
With the development of clustering technology, the demand of clustering of different density data sets is more and more urgent. In order to solve the clustering problem of different density data sets, proposes a multi-stage clustering algorithm based on distance and density (MCDD). The algorithm adopts multi-stage density processing technology to extract clustering of different density, while using the density factor to improve the precision of clustering, removes the outlier and noise data by using the distance threshold method. Scanning the dataset only once, the MCDD can discover clusters of arbitrary shapes. The experiment results show that it can discover outliers or noises effectively and get good cluster quality for multi-density data sets.
基金
郑州市科技攻关项目(No.20130737)
关键词
密度阈值
阶段聚类
密度因子
距离阈值
孤立点
Density Threshold
Multi-Stage Clustering
Density Factor
, Distance Threshold
Outlier