自适应分形聚类进化甄别算法

Self-Adaptive Fractal Technique on Detecting Cluster Evolution

下载PDF

导出

摘要数据流随时间演变具有突发性及随机性的特点,如何自适应、实时追踪这种变化是数据流挖掘面临的一个重要问题,完全由用户通过试探来甄别这种变化在实际中无法实现,同时也失去了数据流聚类进化追踪的现实意义。针对聚类变化自动追踪问题,考虑到现实的计算资源限制和处理速度要求,结合分形聚类、自适应采样技术与Chernoff不等式,提出了数据流聚类演变实时追踪算法,进行聚类演变的自动追踪;通过合成与实际数据集上的实验工作验证了算法的有效性。 Stream data can often show important changes in trends over time. In such cases, it is useful to understand, visualize and diagnose the evolution of these trends. When the data streams are fast and continuous, it becomes important to analyze and predict the trends quickly in online fashion. This paper discusses the real-time clustering evolution tracking for data stream algorithm which integrates the fractal cluster technique, self-adaptive sampline technique with the restriction of computing resource and the requirement of processing speed, and can discriminate the cluster evolution of stream data on time. The experiments over a number of real and synthetic data sets illustrate the effectiveness and efficiency provided by this approach.

作者闫光辉董晓慧刘云贺少领马志程

机构地区兰州交通大学电子与信息工程学院甘肃电力信息通信中心

出处《计算机科学与探索》 CSCD 2010年第7期662-672,共11页 Journal of Frontiers of Computer Science and Technology

基金新世纪优秀人才支持计划No.NCET-10-0017 兰州市科技计划项目No.2008-1-28~~

关键词数据挖掘聚类进化分形自适应采样 data mining cluster evolution fractal self-adaptive sampling

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献28

1Babcock B,Babu S,Datar M,et al.Models and issues in data stream systems[C] //Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems,2002:1-16. 被引量：1
2Guha S,Mishra N,Motwani R,et al.Clustering data streams[C] //Proceedings of the 41st Annual Symposium on Foundations of Computer Science,2000:359-366. 被引量：1
3Aggarwal C,Han J,Wang J,et al.On high dimensional projected clustering ofdata streams[J].Data Mining and Knowledge Discovery,2005,10(3):251-273. 被引量：1
4Han J,Kamber M.Data mining:Concepts and techniques[M].San Marco,CA,USA:Morgan Kaufmann Publishers Inc,2000. 被引量：1
5Guha S,Meyerson A,Mishra N,et al.Clustering data streams:Theory and practice[J].IEEE Transactions on Knowledge and Data Engineering,2003,15(3):515-528. 被引量：1
6Babcock B,Datar M,Motwani R,et al.Maintaining variance and k-medians over data stream windows[C] //Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems,2003:234-243. 被引量：1
7Charikar M,O'Callaghan L,Panigrahy R.Better streaming algorithms for clustering problems[C] //Proc of 35th ACM Symposium on Theory of Computing(STOC),2003. 被引量：1
8Ordonez C.Clustering binary data streams with K-means[C] //Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery,2003:12-19. 被引量：1
9Domingos P,Hulten G.A general method for scaling up machine learning algorithms and its application to clustering[C] //Proceedings of the 18th International Conference on Machine Learning,2001. 被引量：1
10O'Callaghan L,Mishra N,Meyerson A,et al.Streaming-data algorithms for high-quality clustering[C] //Proceedings of 18th International Conference on Data Engineering,2002:685-694. 被引量：1

二级参考文献90

1吴敏金.多重分形熵与多重分维谱[J].电子学报,1993,21(10):7-13. 被引量：1
2金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量：161
3岳士弘,王正友.二分网格聚类方法及有效性[J].计算机研究与发展,2005,42(9):1505-1510. 被引量：15
4任永功,于戈.一种多维数据的聚类算法及其可视化研究[J].计算机学报,2005,28(11):1861-1865. 被引量：13
5宋枫溪,高秀梅,刘树海,杨静宇.统计模式识别中的维数削减与低损降维[J].计算机学报,2005,28(11):1915-1922. 被引量：44
6朱蔚恒,印鉴,谢益煌.基于数据流的任意形状聚类算法[J].软件学报,2006,17(3):379-387. 被引量：51
7Yu L, Liu H. Efficient feature selection via analysis of relevance and redundance [J]. The Journal of Machine Learning Research, 2004, 5(10) : 1205-1224 被引量：1
8Narendra P M, Fukunaga K. A branch and bound algorithm for feature subset selection [J]. IEEE Trans on Computer, 1977, 26(9): 917-922 被引量：1
9Cover T M. The best two independent measurements are not the two best [J]. IEEE Trans on Systems, Man, and Cybernetics, 1974, 4(1) : 116-117 被引量：1
10Dumais S, Platt J, Heckerman D, et al. Inductive learning algorithms and representations for text categorization [C] // Proc of the CIKM-98, the 7th ACM Int Conf on Information and Knowledge Management. New York: ACM, 1998: 148- 155 被引量：1

共引文献125

1梁敏君,倪志伟,倪丽萍,杨葛钟啸.基于网格与分形维数的聚类算法[J].计算机应用,2009,29(3):830-832. 被引量：4
2忻凌,倪志伟,黄玲.基于数据流的BIRCH改进聚类算法[J].计算机工程与应用,2007,43(5):166-168. 被引量：6
3付长龙,吕彦波,姚全珠,杜旭辉.基于样本密度的SVM及其在入侵检测中的应用[J].计算机应用,2007,27(4):838-840. 被引量：1
4刘青宝,戴超凡,邓苏,张维明.基于网格的数据流聚类算法[J].计算机科学,2007,34(3):159-161. 被引量：10
5刘青宝,金燕,侯东风,张维明.数据流层次窗口模型及聚集查询算法[J].计算机科学,2007,34(5):194-196. 被引量：3
6王志坚,魏定国,吴时霖.基于Petri网统一模型的系统开发方法研究[J].系统仿真学报,2007,19(A01):175-178.
7邓维维,彭宏.一种新的演化文本流聚类算法[J].计算机科学,2007,34(9):125-127.
8史金成,胡学钢.数据流挖掘研究[J].计算机技术与发展,2007,17(11):11-14. 被引量：6
9黄孝.数据流聚类算法分析[J].池州学院学报,2007,21(5):11-13. 被引量：1
10朱启家,张伟,陈春燕.高斯混合密度降解模型在数据流聚类中的应用[J].江南大学学报（自然科学版）,2007,6(6):891-894. 被引量：1

1应毅,刘亚军,陈诚.基于云计算技术的个性化推荐系统[J].计算机工程与应用,2015,51(13):111-117. 被引量：24
2吴斌,李冠辰,刘宇,张雷,王柏.基于微博重复发送的垃圾用户甄别[J].数据采集与处理,2015,30(1):117-125. 被引量：1
3宋叶俊,元昌安,王艳.基于Hash表的分类信息匹配及甄别算法[J].计算机工程与设计,2009,30(6):1552-1554. 被引量：2
4汤军,陈松灿.非二值化图序列的Community挖掘[J].山东大学学报（工学版）,2011,41(6):37-42.
5应毅,任凯,刘正涛.基于云计算技术的数据挖掘[J].微电子学与计算机,2013,30(2):161-164. 被引量：20
6毛可飞.海洋基本地理特征可视化技术研究[J].计算机仿真,2006,23(12):197-200.
7严伟荣,蔡士杰.基于差分服务的贪婪流问题解决算法[J].计算机研究与发展,2003,40(2):208-214.
8杨国晖,王荣.A Comparison of the Concurrence and the Quantum Discord in a Two-Qubit System[J].Chinese Physics Letters,2015,32(2):5-9.
9石博天,张学良.基于蓝牙Piconet的LEGO多机器人P2P通信[J].计算机应用,2012,32(A02):73-75.
10应毅,任凯,曹阳.基于改进的MapReduce模型的Web挖掘[J].科学技术与工程,2013,21(5):1205-1209. 被引量：10

计算机科学与探索

2010年第7期

浏览历史

内容加载中请稍等...

自适应分形聚类进化甄别算法

参考文献28

二级参考文献90

共引文献125

相关作者

相关机构

相关主题

浏览历史