期刊文献+

一种优化的基于网格的聚类算法 被引量:6

Optimized Cell-based Clustering Algorithm
下载PDF
导出
摘要 聚类是数据挖掘领域中一个重要的研究课题.与其它算法相比,基于网格的聚类算法可以高效处理低维的海量数据.然而,由于划分的单元数与数据的维数呈指数增长,因此对于维数较高的数据集,生成的单元数过多,导致算法的效率较低.本文基于CD-Tree设计了新的基于网格的聚类算法,该算法的效率远高于传统的基于网格聚类算法的效率.此外,本文设计了一种剪枝优化策略,以提高算法的效率.实验表明,与传统的聚类算法相比,基于CD-Tree的聚类算法在数据集的大小及维度的可伸缩性方面均有显著提高. In data mining fields, clustering is an important issue. Comparing with other algorithms, the cell-based clustering algorithms can be applied to low dimensional data. However, in the cell-based algorithms, the number of ceils will increase exponentially with the dimensionality. So it is low efficient with high dimensionality due to a large number of cells. This paper proposes a new clustering algorithm based on CD-Tree, which improve largely the efficiency of the cell-based algorithm. In addition, to improve the efficiency of the algorithm further, we design the pruning strategy that prunes the non-dense cells before the clustering procedure. Extensive experiments on real and synthetic datasets also show that the algorithm has better scalability than other cell-based clustering algorithms.
出处 《小型微型计算机系统》 CSCD 北大核心 2006年第10期1927-1930,共4页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(60573090)资助 辽宁自然科学基金项目(20052006)资助 辽宁省教育厅攻关计(05L354)资助.
关键词 数据挖掘 聚类分析 CD—Tree 基于网格的算法 data mining clustering analysis CD-Tree the cell-based algorithm
  • 相关文献

参考文献8

  • 1Ng R T,Han J.Efficient and effective clustering methods for spatial data mining[C].In:Proceedings of the 20th VLDB Conference,Santiago,Chile,1994,144-155. 被引量:1
  • 2Karypis G,Han E-H,Kumar V.Chameleon:a hierarchical clustering algorithm using dynamic modeling[J].Computer,1999,32(1):68-75. 被引量:1
  • 3Ankerst M,Breunig M M,Kriegel H-P,et al.Optics:ordering points to identify the clustering structure[C].In:Proc.of SIGMOD 1999,Philadelphia,Pennsylvania USA,1999:49-60. 被引量:1
  • 4Levent Ertoz,Michael Steinbach,Vipin Kumar.Finding clusters of different sizes,shapes,and densities in noisy,high dimensional data[C].In:Proceedings of the third SIAM international conference on data mining,San Francisco,CA,USA,May 1-3,2003. 被引量:1
  • 5Wang W,Yang J,Muntz R.Sting:a statistical information grid approach to spatial data mining[C].In:Proceedings of the 23rd conference on VLDB,Athens,Greece,1997,186-195. 被引量:1
  • 6Sheikholeslami G,Chatterjee S,Zhang A.Wavecluster:a multi-resolution clustering approach for very large spatial databases[C].In:Proceedings of the 24th Conference on VLDB,New York,NY 1998,428-439. 被引量:1
  • 7Agrawal R,Gehrke J,Gunopulos D,et al.Automatic subspace clustering of high dimensional data for data mining applications[C].In:Proc.of ACM SIGMOD Conf.Seattle,WA,1998:94-105. 被引量:1
  • 8Sun Huan-liang,Bao Yu-bin,Zhao Fa-xin,et al.CD-Trees:an efficient index structure for outlier detection[C].In:Proc.of WAIM′04,Dalian,2004:600-609. 被引量:1

同被引文献42

引证文献6

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部