期刊文献+

基于约束的混合属性增量聚类算法

Constraint-based incremental clustering algorithm with mixed attributes
下载PDF
导出
摘要 为解决大规模数据集聚类过程中内存容量受限问题,提出了一种基于聚类个数约束的快速聚类算法,只需扫描一趟原始数据集,半径阈值随聚类过程动态变化;同时定义了一种包含分类属性取值频率信息的类间差异性度量,可用于混合属性数据集,时间复杂度与空间复杂度同数据集大小、属性个数近似成线性关系。在KDDCUP99数据集上的实验结果表明,提出的算法输入参数少,具有良好的聚类特性,可用于大规模数据集。 To solve the constraint of the memory capacity during clustering the large-scale dataset, a fast clustering algorithm based on the constraint of the number of clusters is put forward. The original dataset is read only once and the radius threshold changes dynamically. At the same time an inter-cluster dissimilarity measure taking into account the frequency information of the categorical attribute values is introduced, which can be used for the mixed dataset. The time complexity and space complexity are nearly linear with the size of dataset and the number of attributes. The experimental results on the KDDCUP99 dataset show that the proposed algorithm is feasible and effective, which can be used for the large-scale dataset.
出处 《计算机工程与设计》 CSCD 北大核心 2010年第8期1799-1801,1805,共4页 Computer Engineering and Design
基金 国家863高技术研究发展计划基金项目(2006AA01A120) 国家自然科学基金项目(10871040)
关键词 混合属性 增量聚类 差异度量 大规模数据集 约束 mixed attributes clustering incrementally dissimilarity measure large-scale dataset constraint
  • 相关文献

参考文献12

二级参考文献71

  • 1张猛,王大玲,于戈.一种基于自动阈值发现的文本聚类方法[J].计算机研究与发展,2004,41(10):1748-1753. 被引量:16
  • 2叶吉祥,谭冠政,路秋静.基于核的非凸数据模糊K-均值聚类研究[J].计算机工程与设计,2005,26(7):1784-1785. 被引量:7
  • 3GUHA S, RASTOGI R, SHIM K. ROCK: A robust clustering algorithm for categorical attributes[ A]. In proceedings of the 15th ICDE[C], 1999.512-521. 被引量:1
  • 4GANTI V, GEHRKE J, RAMAKRISHNAN R. Cactus- clustering categorical data using summaries[ A]. In Proc 1999 Int Conf Knowledge Discovery and Data Mining[ C], 1999.73 -83. 被引量:1
  • 5GUHA S , MEYERSON A , MISHRA N , et al . Clustering data streams: Theory and practice[ J]. Knowledge and Data Engineering,IEEE Transactions on, 2003, 15(3): 515 -528. 被引量:1
  • 6PORTNOY L, ESKIN L, STOLFO S. Intrusion Detection with Unla-beled Data using Clustering[ A]. In Proceedings of ACM CSS Workshop on Data Mining Applied to Security (DMSA-2001) [ C], Philadelphia, PA, 2001. 被引量:1
  • 7ESKIN E, ARNOLD A, PRERAU M, et al. A geometric framework for unsupervised anomaly detection: Detecting intrusions in unla-beled data[ Z]. In Data Mining for Security Applications, 2002. 被引量:1
  • 8SHENG YJ , YU MX . An Efficient Clustering Algorithm [ A ] . In Proc of 2004 International Conference on Machine Learning and Cybernetics[ C], 2004.8. 被引量:1
  • 9MERZ C J, MERPHY P. UCI repository of machine learning databases[ EB/OL]. http://www. ics. uci. edu/ relearn/ MLRRepository. html, 2000. 被引量:1
  • 10R Agrawal,J Gehrke,D Gunopolos et al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Application[C].In:Proceedings of the ACM SIGMOD International Conference on Management of Data, 1998:94~105. 被引量:1

共引文献91

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部