一种有效的数据流二次聚类算法被引量：2

Effective Twice-Clustering Algorithm for Data Streams

下载PDF

导出

摘要为提高数据分布不规则和含有噪音时的数据流聚类质量,提出了一种有效的数据流二次聚类算法TCLUSA.该算法基于分区思想,采用DBSCAN方法对每块分区进行聚类,以得到的簇的均值点作为其代表点,再用k-m eans对所获得的代表点进行聚类,算法采用分层结构保存每次聚类获得的簇参考点,直至获得最终结果.理论分析和实验结果表明,TCLUSA算法能有效提高数据流的聚类质量. In order to enhance the quality of data stream clustering towards noisy and unbalanced data, an effective twice-clustering algorithm for data streams, TCLUSA for short, was proposed TCLUSA is based on the simple divide-and-conquer and separability theorems, uses DBSCAN （ density-based spatial clustering of applications with noise） to get the average point of each cluster as its local result, and then achieves the final result by clustering all the average points using the k- means. The algorithm keeps all the average points by a layered structure. The theoretical analysis and experimental results demonstrate that the proposed algorithm can enhance clustering quality efficiently when data distribution is abnormal or a high dimensional data stream is dealt with.

作者胡学钢曹永照吴共庆

机构地区合肥工业大学计算机与信息学院

出处《西南交通大学学报》 EI CSCD 北大核心 2009年第4期490-494,共5页 Journal of Southwest Jiaotong University

基金安徽省自然科学基金资助项目(050420207) 安徽省高校青年教师科研资助计划(2005jq1012)

关键词数据流聚类密度簇参考点 k-均值参考点 data stream clustering reference point of density cluster k-means reference point

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献12

1GUHA S,MEYERSON A,MISHRA N,et al.Clustering data streams:theory and practice[J].IEEE Trans.on Knowledge and Data Engineering,2003,15(3):515-528. 被引量：1
2GUHA S,MISHRA N,MOTWANI R,et al.Clustering data streams[C/OL]∥Proceedings of the Annual Symposium on Foundations of Computer Science.http://citeseer.ist.psu.edu/guha00clustering.html. 被引量：1
3DOMINGOS P,HULTON G.A general method for scaling up machine learning algorithms and its application to clustering[C/OL]∥Proceedings of the 18th International Conference on Machine Learning (ICML 2001).http://citeseer.ist.psu.edu/595836.html. 被引量：1
4OCALLAGHAN L,MISHRA N,MEYERSON A,et al.Streaming-data algorithm for high quality clustering[C/OL]∥Proceedings of IEEE International Conference on Data Engineering.http://citeseer.ist.psu.edu/497671.html. 被引量：1
5DATAR M,GIONIS A,INDYK P,et al.Maintaining stream statistics over sliding windows[C]∥Proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA 2002).San Francisc:[s.n.],2002,31:635-644. 被引量：1
6BABCOCK B,DATAR M,MOTWANI R,et al.Maintaining variance and k-medians over data stream windows[C]∥Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.New York:ACM,2003:234-243. 被引量：1
7AGGARWAL C C,HAN Jiawei,WANG Jianyoung,et al.A framework for clustering evolving data streams[C]∥Proceedings of the 29th International Conference on Very large Data Bases.Berlin:VLDB Endowment,2003,29:81-92. 被引量：1
8MOTOYOSHI M,MIURA T,SHIOYA I.Clustering stream data by regression analysis[C]∥Proceedings of 2th Workshop on Australasian Information Security,Data Mining and Web Intelligence,and Software Internationalisation.Dunedin:Australian Computer Society,Inc.,2004,32:115-120. 被引量：1
9PARK N H,LEE W S.Statistical grid-based clustering over data streams[B/OL]∥[2008-01-16].http://portal.acm.org/citation.cfm?doid=974121.974127t. 被引量：1
10SONG Mingzhou,WANG Hongbin.Highly efficient incremental estimation of gaussian mixture models for online data stream clustering[C]∥Proceedings of Intelligent Computing:Theory and Applications Ⅲ.Bellingham:SPIE,2005,5803:174-183. 被引量：1

同被引文献40

1王伦文.聚类的粒度分析[J].计算机工程与应用,2006,42(5):29-31. 被引量：19
2Marques S J P.模式识别--原理、方法及应用[M].吴逸飞,译.北京:清华大学出版社,2002. 被引量：2
3HANJ W,KAMBER M.Data mining:concepts and techniques[M].San Francisco:Morgan Kanfmann Publishers,2000:335-391. 被引量：1
4SAMBASIVAM S,THEODOSOPOULOS N.Advanced data clustering methods of mining Web documents[J].Issues in Informing Science and Information Technology,2006(3):563-579. 被引量：1
5CHANG K C,YEH M F.Grey relational analysis based approachfordata clustering[J].1EE Proc.-Vis.Image Signal Process,2005,152(2):165-172. 被引量：1
6YEH M F,CHIANG S S.Grey ART network for data clustering[J].Neuroeomputing,2005,67:313-320. 被引量：1
7ESTER M,KRIEGEL H P,SANDER J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]∥Proc.2nd Int Conf on Knowledge Discovery and Data Mining.Menlo Park:AAAJ Press,1996:226-231. 被引量：1
8DENG J L.Introduction to grey system theory[J].J.Grey System,1989,1(1):1-24. 被引量：1
9GRABMEIER J,RUDOLPH A.Techniques of clustering algnrithims in data mining[J].Data Mining and Knowledge Discovery,2002,6(4):303-360. 被引量：1
10SUN Haojun,WANG Shengrui,JIANG Qingshan.FCM-based model selection algorithm for determining the numberof cluster[J].Pattem Recognition,2004,37(10):2027-2037. 被引量：1

引证文献2

1陈韬伟,金炜东,李杰.基于灰关联测度的分裂式层次聚类算法[J].西南交通大学学报,2010,45(2):296-301. 被引量：6
2朱红,丁世飞.变粒度二次聚类方法[J].山东大学学报（工学版）,2015,45(3):1-6.

二级引证文献6

1陆维特,陈婷婷,闫鹏飞.城市共享单车利益相关者判定与分类方法研究[J].中国水运（下半月）,2020(10):45-46. 被引量：1
2关欣,孙祥威,何友.基于灰关联度和距离的特征关联算法研究[J].雷达科学与技术,2013,11(4):363-367. 被引量：3
3吕琳,尉永清,任敏,潘晓.基于蚁群优化算法的凝聚型层次聚类[J].计算机应用研究,2017,34(1):114-117. 被引量：16
4朱鹏,王俊,周菊香.基于细化度与相似度的课程知识图谱构建研究[J].软件导刊,2019,18(5):69-72. 被引量：4
5孙寒涛.基于状态和属性的多目标联合关联算法[J].应用科技,2020,47(3):74-79. 被引量：4
6郭小芹.基于多重指标的气候品质量化评价——以河西东部红枸杞为例[J].干旱区研究,2019,36(3):677-683. 被引量：6

1杜翠凤,余艺,蒋超.基于空间密度聚类的移动用户热点区域识别方法[J].移动通信,2015,39(16):40-43. 被引量：3
2冯万兴,朱晔,郭钧天,张晓庆,刘娟.基于改进的DBSCAN方法和多项式拟合的雷电短时预测[J].计算机工程与科学,2014,36(10):2028-2033. 被引量：9
3罗丹,刘先锋.基于DBSCAN算法的XML结构相似性聚类研究[J].信息技术,2009,33(8):24-26.
4刘军,艾力.斯木吐拉,马晓松.一种改进的DBSCAN聚类算法的研究与应用[J].交通与计算机,2008,26(3):60-64. 被引量：5
5冯少荣,肖文俊.基于密度的DBSCAN聚类算法的研究及应用[J].计算机工程与应用,2007,43(20):216-221. 被引量：34
6詹益旺,胡斌杰.基于DVTD的移动用户出行模式识别研究[J].计算机工程,2016,42(7):72-76. 被引量：2

西南交通大学学报

2009年第4期

浏览历史

内容加载中请稍等...

一种有效的数据流二次聚类算法被引量：2

参考文献12

同被引文献40

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

一种有效的数据流二次聚类算法 被引量：2

参考文献12

同被引文献40

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

一种有效的数据流二次聚类算法被引量：2