摘要
对数据挖掘中基于密度聚类的相关概念和算法进行了讨论,对OPTICS(O rdering Pointers to Iden-tify the C lustering Structure)算法聚类分析的正确性给以了证明。以DBSCAN,OPTICS为基础,提出了一种基于密度的简单有效的聚类算法。新算法主要在ε-邻域查询和种子队列的更新两个方面作了改进,给出了一种简单、效率较高的邻域查询方法-哈希表法,即对整个数据集合或部分数据作网格化处理。测试结果表明新算法能够有效地对大规模数据进行聚类,效率较高。
After discussing the concepts and algorithms of density-based clustering, the correctness of cluster analysis of OPTICS(Ordering Pointers to Identify the Clustering Structure) algorithm is proved. Moreover, a simple and valid density -based clustering algorithm is proposed, which is based on the original DBSCAN and OVFICS. The new algorithm makes improvements on region query and update of seeds queue. A simple and efficient region query method, hash - table method is developed. Experimental results show that the new algorithm is effective and efficient in clustering large -scale data sets.
出处
《南京邮电学院学报(自然科学版)》
2005年第4期24-29,共6页
Journal of Nanjing University of Posts and Telecommunications
关键词
数据挖掘
聚类
距离
密度
邻域查询
Data Mining
Clustering
Distance
Density
Region Queries