摘要
针对网格密度聚类算法存在的网格宽度和密度阈值难以确定以及聚类精度不高的缺陷,提出了一种参数自适应的网格密度聚类算法。定义了数据集标准化离散度的概念,运用数据集的自然分布信息自适应地计算出每一维较优的分割宽度,对不同的密度阈值统计其噪声样本对象的数量,绘制了噪声曲线,从噪声曲线中获得最佳的密度阈值,而且增加了类簇边缘处理技术,进一步提高了聚类的质量。仿真实验表明,改进后的算法可获得更好的聚类效果。
The clustering algorithm based on grid density is difficult to determine the grid width and density threshold.In addition,the accuracy of the results is dissatisfied.Considering the problem above,this paper proposed an improved clustering algorithm.It defined the concept of standardized dispersion of data sets and calculated the better segmentation width of each dimension by the natural distribution information of the data set.According to the different density thresholds,it calculated the number of the noise.It drew the noise curve,and obtained the best density threshold from the noise curve.Moreover,it increased the edge processing technology of cluster,and further improved the quality of clustering.Simulation results show that the improved algorithm can get better clustering results.
作者
郑诚
曹杨
Zheng Cheng;Cao Yang(Key Laboratory of Intelligent Computing&Signal Processing of Ministry of Education,Anhui University,Hefei 230601,China;College of Computer Science&Technology,Anhui University,Hefei 230601,China)
出处
《计算机应用研究》
CSCD
北大核心
2019年第11期3278-3281,3309,共5页
Application Research of Computers
基金
安徽省高校自然科学研究重点项目(KJ2013A020)
关键词
网格密度
聚类
空间划分
噪声曲线
grid density
clustering
space partition
noise curve