期刊文献+

快速大样本同步聚类 被引量:2

Fast clustering by synchronization on large sample
下载PDF
导出
摘要 针对现有的Sync算法具有较高时间复杂度,在处理大样本数据集时有相当的局限性,提出了一种快速大样本同步聚类算法(Fast Clustering by Synchronization on Large Sample,FCSLS)。首先将基于核密度估计(KDE)的抽样方法对大样本数据进行抽样压缩,再在压缩集上进行同步聚类,通过Davies-Bouldin指标自动寻优到最佳聚类数,最后,对剩下的大规模数据进行聚类,得到最终聚类结果。通过在人造数据集以及UCI真实数据集上的实验,FCSLS可以在大规模数据集上得到任意形状、密度、大小的聚类且不需要预设聚类数。同时与基于压缩集密度估计和中心约束最小包含球技术的快速压缩方法相比,FCSLS在不损失聚类精度的情况下,极大地缩短了同步聚类算法的运行时间。 Since the existing clustering synchronization clustering algorithm Sync is highly complex in time, and it cannot be applied into the case of large sample, it proposes a new algorithm named Fast Clustering by Synchronization on Large Sample(FCSLS). To apply this algorithm, it firstly condenses the large sample dataset by using the KDE based sampling method, and then, carries out the cluster synchronization of compressed dataset, finding out the best clustering data by using the Davies-Bouldin clustering criterion, finally, gets the final clustering results by clustering the rest objects in the large dataset. Based on the empirical result from the synthetic datasets and UCI real-world datasets, it concludes that FCSLS can detect clusters of any shape density and size without pre-setting the cluster number. Meanwhile, compar-ing with the compression algorithm based on RSDE and CCMEB, FCSLS can significantly reduce the operation time of the cluster synchronization algorithm without losing the clustering accuracy.
作者 乔颖 王士同
出处 《计算机工程与应用》 CSCD 北大核心 2016年第23期159-166,219,共9页 Computer Engineering and Applications
基金 国家自然科学基金(No.61272210)
关键词 核密度估计(KDE) 抽样 同步 大样本 聚类 Kernel Density Estimate (KDE ) sampling synchronization large sample clustering
  • 相关文献

参考文献3

二级参考文献51

  • 1李存华,孙志挥,陈耿,胡云.核密度估计及其在聚类算法构造中的应用[J].计算机研究与发展,2004,41(10):1712-1719. 被引量:64
  • 2张廷宪,郑志刚.耦合非线性振子系统的同步研究[J].物理学报,2004,53(10):3287-3292. 被引量:15
  • 3Shekhar S, Huang Y. Co-location Rules Mining.. A Summary of Results [C]. The 7th International Symposium on Spatio and Temporal Database (SSTD), New York, 2001 被引量:1
  • 4Morimoto Y. Mining Frequent Neighboring Class Sets in Spatial Databases[C]. The 7th ACM SIGKDD International Conf on Knowledge Discovery and Data Mining, San Franciscc, California, 2001 被引量:1
  • 5Huang Yan, Shashi S, Xiong Hui. Discovering Colocation Patterns from Spatial Datasets: A General Approach[J]. Transactions on Knowledge and Data Engineening, 2004,16 (6) : 被引量:1
  • 6Yoo J, Shekhar S. A Partial Join Approach for Mining Co-location Patterns[C]. The 12nd Annual ACM International Workshop on Geographic Information Systems ( ACM-GIS), Washington D C, USA, 2004 被引量:1
  • 7Yoo J, Shekhar S, Celik M. A Join-less Approach for Co-location Pattern Mining: A Summary of Results[C]. The 5th IEEE International Conference on Data Mining(ICDM'05), Houston, USA, 2005 被引量:1
  • 8Huang Yan, Pei Jian, Xiong Hui. Mining Co-Location Patterns with Rare Events from Spatial Data Sets[J]. GeoInformatica, 2006(10):239-260 被引量:1
  • 9Cover T M, Hart P E. Nearest Neighbor Pattern Classification [ J ]. Knowledge Based Systems, 1995, 8(6): 373-389 被引量:1
  • 10Zhou Shuigeng, Zhao Yue, Guan Jihong, et al. A Neighborhood-based Clustering Algorithm [M]. Berlin/Heidelberg : Springer, 2005 被引量:1

共引文献36

同被引文献8

引证文献2

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部