期刊文献+

面向大数据集的共享近邻聚类研究 被引量:5

Research on Shared Nearest Neighbor Clustering for Large Dataset
下载PDF
导出
摘要 共享近邻(SNN)相似度能有效克服由高维和多密度等因素引起的聚类有效性问题,但计算效率不高.基于分治策略,提出一种改进的共享近邻聚类算法(DC-SNN).采用软划分策略将数据集分割为多个小规模子集,降低了计算SNN相似矩阵时需要搜索的数据点数量,同时,也避免了子集分割边界对数据点K近邻产生的不利影响.根据在子集中定义的核心数据点和扩展数据点,给出了子集中SNN相似矩阵的计算方法和合并策略,从而确保了以子集SNN相似矩阵表示整个数据集SNN相似矩阵的有效性.实验结果表明,DC-SNN算法能够在确保聚类精度不变的情况下,显著提高共享近邻聚类的效率. Shared nearest neighbor { SNN } similarity can effectively overcome the problems of cluster validity caused by the factors such as high-dimensional and multi-density, but a high computational cost is required for the SNN similarity matrix. Based on divide and conquer strategy, an improved shared nearest neighbor clustering algorithm ( DC-SNN) is proposed to address the issue. Using a soft partitioning strategy, the dataset is divided into some small subsets. Then less data points are searched during computing the SNN similarity matrix of each subset, and the adverse impact on the K nearest neighbors of data points, which is caused by the partition boundaries of the subsets, can effectively be avoided. Furthermore, according to the two terms defined in the subset, namely core data point and extended data point, both the computing method and combining strategy for SNN similarity matrix in the subset are provided to ensure that the SNN similarity matrix of dataset can effectively be expressed by those of all subsets. The experimental results show that DC-SNN algorithm can significantly improve the efficiency of the shared nearest neighbor clustering without the clustering accuracy declined.
出处 《小型微型计算机系统》 CSCD 北大核心 2014年第1期50-54,共5页 Journal of Chinese Computer Systems
基金 广东省教育部产学研结合项目(2011B090400466)资助 广东省教育科学规划项目(2010tjk119)资助 广东金融学院校级课题项目(11XJ04-03)资助
关键词 共享近邻 分治法 大数据集 聚类分析 shared nearest neighbor divide and conquer large dataset clustering analysis
  • 相关文献

参考文献2

二级参考文献16

  • 1沈红斌,王士同,吴小俊.离群模糊核聚类算法[J].软件学报,2004,15(7):1021-1029. 被引量:37
  • 2Bach F R, Jordan M I. Learning spectral clustering[ C]. In Pro- ceeding of NIPS, 2004. 被引量:1
  • 3Ozertem U, Erdogmus D, Jenssen R. Mean shift spectral clustering [ J ]. Pattern Recognition, 2008, 41 ( 6 ) : 1924-1938. 被引量:1
  • 4Zelnik-Manor L, Perona P. Serf-tuning spectral clustering[C]. In Proceeding of NIPS, 2005: 1601-1608. 被引量:1
  • 5Ng A, Jordan M I, Weiss Y. On spectral clustering: analysis and an algorithra[ C]. In Proceeding of NIPS, 2002: 849-856. 被引量:1
  • 6Yu S, Shi J. Multiclass spectral clustering [ C ]. In Proceeding of the Ninth IEEE International Conference on Computer Vision, 2003 : 313-319. 被引量:1
  • 7Shi J, Malik L Normalized cuts and image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8) : 888-905. 被引量:1
  • 8Meila M, Shi J. A random walks view of spectral segmentation [ C]. Tenth International Workshop on Artificial Intelligence and Statistics (AI-STAT), 2001. 被引量:1
  • 9Luxburg U. A tutorial on spectral clustering [ J]. Statistics and Computing, 2007,17(4) : 395-416. 被引量:1
  • 10Zhou D, Bousquet O, Lal T, et al. Learning with local and global consistency[C]. In Proceeding of NIPS, 2004: 321-328. 被引量:1

共引文献28

同被引文献49

引证文献5

二级引证文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部