基于约束的混合属性增量聚类算法

Constraint-based incremental clustering algorithm with mixed attributes

下载PDF

导出

摘要为解决大规模数据集聚类过程中内存容量受限问题,提出了一种基于聚类个数约束的快速聚类算法,只需扫描一趟原始数据集,半径阈值随聚类过程动态变化;同时定义了一种包含分类属性取值频率信息的类间差异性度量,可用于混合属性数据集,时间复杂度与空间复杂度同数据集大小、属性个数近似成线性关系。在KDDCUP99数据集上的实验结果表明,提出的算法输入参数少,具有良好的聚类特性,可用于大规模数据集。 To solve the constraint of the memory capacity during clustering the large-scale dataset, a fast clustering algorithm based on the constraint of the number of clusters is put forward. The original dataset is read only once and the radius threshold changes dynamically. At the same time an inter-cluster dissimilarity measure taking into account the frequency information of the categorical attribute values is introduced, which can be used for the mixed dataset. The time complexity and space complexity are nearly linear with the size of dataset and the number of attributes. The experimental results on the KDDCUP99 dataset show that the proposed algorithm is feasible and effective, which can be used for the large-scale dataset.

作者苏晓珂兰洋程耀东万仁霞

机构地区东华大学信息科学与技术学院信阳师范学院计算机与信息技术学院中国科学院高能物理研究所计算中心

出处《计算机工程与设计》 CSCD 北大核心 2010年第8期1799-1801,1805,共4页 Computer Engineering and Design

基金国家863高技术研究发展计划基金项目(2006AA01A120) 国家自然科学基金项目(10871040)

关键词混合属性增量聚类差异度量大规模数据集约束 mixed attributes clustering incrementally dissimilarity measure large-scale dataset constraint

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献12

1蒋盛益,李庆华,李新.数据流挖掘算法研究综述[J].计算机工程与设计,2005,26(5):1130-1132. 被引量：21
2冯兴杰,黄亚楼.带约束条件的聚类算法研究[J].计算机工程与应用,2005,41(7):12-14. 被引量：12
3田小林,焦李成,缑水平.基于PSO优化空间约束聚类的SAR图像分割[J].电子学报,2008,36(3):453-457. 被引量：12
4何增有,徐晓飞,邓胜春.Squeezer：An Efficient Algorithm for Clustering Categorical Data[J].Journal of Computer Science & Technology,2002,17(5):611-624. 被引量：32
5赵立江,黄永青,刘玉龙.改进的混合属性数据聚类算法[J].计算机工程与设计,2007,28(20):4850-4852. 被引量：7
6Liu Jin-ze,Zhang Qi,Wang Wei,et al.PoClustering:Lossless clustering of dissimilarity data[C].Proceedings of the 7th SIAM international conference on Data Mining,2007. 被引量：1
7王燕.聚类类别数目自动学习算法研究[J].计算机工程与设计,2007,28(2):252-253. 被引量：6
8Eskin E, Arnold A, Prerau M. A geometric framework for unsupervised anomaly detection:Detecting intrusions in unlabeled data[C].Applieations of Data Mining in Computer Security, Advances In Information Seeurity.Boston:Kluwer Academic Publishers,2002. 被引量：1
9Jiang Sheng-yi,Song Xiao-yu.A clustering-based method for unsupervised intrusion detections[J].Pattern Recognition Letters, 2006,27(5):802-810. 被引量：1
10蒋盛益,李庆华.一种基于引力的聚类方法[J].计算机应用,2005,25(2):286-288. 被引量：9

二级参考文献71

1张猛,王大玲,于戈.一种基于自动阈值发现的文本聚类方法[J].计算机研究与发展,2004,41(10):1748-1753. 被引量：16
2叶吉祥,谭冠政,路秋静.基于核的非凸数据模糊K-均值聚类研究[J].计算机工程与设计,2005,26(7):1784-1785. 被引量：7
3GUHA S, RASTOGI R, SHIM K. ROCK: A robust clustering algorithm for categorical attributes[ A]. In proceedings of the 15th ICDE[C], 1999.512-521. 被引量：1
4GANTI V, GEHRKE J, RAMAKRISHNAN R. Cactus- clustering categorical data using summaries[ A]. In Proc 1999 Int Conf Knowledge Discovery and Data Mining[ C], 1999.73 -83. 被引量：1
5GUHA S , MEYERSON A , MISHRA N , et al . Clustering data streams: Theory and practice[ J]. Knowledge and Data Engineering,IEEE Transactions on, 2003, 15(3): 515 -528. 被引量：1
6PORTNOY L, ESKIN L, STOLFO S. Intrusion Detection with Unla-beled Data using Clustering[ A]. In Proceedings of ACM CSS Workshop on Data Mining Applied to Security (DMSA-2001) [ C], Philadelphia, PA, 2001. 被引量：1
7ESKIN E, ARNOLD A, PRERAU M, et al. A geometric framework for unsupervised anomaly detection: Detecting intrusions in unla-beled data[ Z]. In Data Mining for Security Applications, 2002. 被引量：1
8SHENG YJ , YU MX . An Efficient Clustering Algorithm [ A ] . In Proc of 2004 International Conference on Machine Learning and Cybernetics[ C], 2004.8. 被引量：1
9MERZ C J, MERPHY P. UCI repository of machine learning databases[ EB/OL]. http://www. ics. uci. edu/ relearn/ MLRRepository. html, 2000. 被引量：1
10R Agrawal,J Gehrke,D Gunopolos et al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Application[C].In:Proceedings of the ACM SIGMOD International Conference on Management of Data, 1998:94～105. 被引量：1

共引文献91

1丁小军,陈杰,李霖,徐碧通,朱晓姝.一种基于聚类结果稳定性来确定聚类数的方法[J].玉林师范学院学报,2020(3):43-47. 被引量：1
2卓琳,赵厚宇,詹思延.异常检测方法及其应用综述[J].计算机应用研究,2020,37(S01):9-15. 被引量：25
3付淇,李正凡.基于CLIQUE的聚类算法研究[J].华东交通大学学报,2006,23(5):79-82. 被引量：12
4蒋盛益,李庆华.一种基于引力的聚类方法[J].计算机应用,2005,25(2):286-288. 被引量：9
5蒋盛益,李庆华.聚类分析中的差异性度量方法研究[J].计算机工程与应用,2005,41(11):146-149. 被引量：4
6蒋盛益,李庆华,李新.数据流挖掘算法研究综述[J].计算机工程与设计,2005,26(5):1130-1132. 被引量：21
7蒋盛益,李庆华,王卉,孟中楼.一种基于聚类的有指导的入侵检测方法[J].小型微型计算机系统,2005,26(6):1042-1045. 被引量：6
8蒋盛益,李庆华,赵延喜.一种两阶段异常检测方法[J].小型微型计算机系统,2005,26(7):1237-1240. 被引量：7
9蒋盛益,李庆华.基于引力的入侵检测方法[J].系统仿真学报,2005,17(9):2202-2206. 被引量：6
10高原,耿国华,王怡.基于动态矩形的聚类方法的设计与实现[J].计算机应用,2006,26(4):870-871.

1杨帆.基于FR-NFR矩阵的软件产品线需求差异度量方法[J].农业科技与装备,2014,0(11):40-42.
2黄新,保文星.基于改进Hopfield神经网络的图像特征点匹配算法[J].计算机工程与设计,2010,31(9):1961-1964. 被引量：1
3唐永红,刘绪栋.一种基于混合属性数据集的异常检测方法[J].科学技术与工程,2013,21(7):1832-1835. 被引量：1
4陈晓平,沈记全.分类器集成在入侵检测中的应用研究[J].河南理工大学学报（自然科学版）,2012,31(3):322-325.
5王欣洁.基于灰度矩阵的中文碎纸片的拼接复原算法[J].智能计算机与应用,2013,3(6):95-97. 被引量：10
6韩伟,张子成.求解旅行商问题的离散型贝壳漫步优化算法[J].模式识别与人工智能,2016,29(7):650-657. 被引量：5
7吕春燕,张强,陈荣,孟令晶.用于程序故障定位的成功路径选择方法[J].军事通信技术,2010(3):11-15.
8刘英锋,王程,王润生.图像序列超分辨率处理中充分帧数的估计[J].计算机工程与科学,2007,29(1):59-61.
9魏霖静,练智超,王联国,侯振兴.基于词条与语意差异度量的文档聚类算法[J].计算机科学,2016,43(12):229-233. 被引量：1
10任永功,杨荣杰,尹明飞.基于特征权重与词间相关性的文本特征选择算法[J].计算机应用与软件,2012,29(9):33-36. 被引量：3

计算机工程与设计

2010年第8期

浏览历史

内容加载中请稍等...

基于约束的混合属性增量聚类算法

参考文献12

二级参考文献71

共引文献91

相关作者

相关机构

相关主题

浏览历史