期刊文献+

一种有效的数据流二次聚类算法 被引量:2

Effective Twice-Clustering Algorithm for Data Streams
下载PDF
导出
摘要 为提高数据分布不规则和含有噪音时的数据流聚类质量,提出了一种有效的数据流二次聚类算法TCLUSA.该算法基于分区思想,采用DBSCAN方法对每块分区进行聚类,以得到的簇的均值点作为其代表点,再用k-m eans对所获得的代表点进行聚类,算法采用分层结构保存每次聚类获得的簇参考点,直至获得最终结果.理论分析和实验结果表明,TCLUSA算法能有效提高数据流的聚类质量. In order to enhance the quality of data stream clustering towards noisy and unbalanced data, an effective twice-clustering algorithm for data streams, TCLUSA for short, was proposed TCLUSA is based on the simple divide-and-conquer and separability theorems, uses DBSCAN ( density-based spatial clustering of applications with noise) to get the average point of each cluster as its local result, and then achieves the final result by clustering all the average points using the k- means. The algorithm keeps all the average points by a layered structure. The theoretical analysis and experimental results demonstrate that the proposed algorithm can enhance clustering quality efficiently when data distribution is abnormal or a high dimensional data stream is dealt with.
出处 《西南交通大学学报》 EI CSCD 北大核心 2009年第4期490-494,共5页 Journal of Southwest Jiaotong University
基金 安徽省自然科学基金资助项目(050420207) 安徽省高校青年教师科研资助计划(2005jq1012)
关键词 数据流聚类 密度簇参考点 k-均值参考点 data stream clustering reference point of density cluster k-means reference point
  • 相关文献

参考文献12

  • 1GUHA S,MEYERSON A,MISHRA N,et al.Clustering data streams:theory and practice[J].IEEE Trans.on Knowledge and Data Engineering,2003,15(3):515-528. 被引量:1
  • 2GUHA S,MISHRA N,MOTWANI R,et al.Clustering data streams[C/OL]∥Proceedings of the Annual Symposium on Foundations of Computer Science.http://citeseer.ist.psu.edu/guha00clustering.html. 被引量:1
  • 3DOMINGOS P,HULTON G.A general method for scaling up machine learning algorithms and its application to clustering[C/OL]∥Proceedings of the 18th International Conference on Machine Learning (ICML 2001).http://citeseer.ist.psu.edu/595836.html. 被引量:1
  • 4OCALLAGHAN L,MISHRA N,MEYERSON A,et al.Streaming-data algorithm for high quality clustering[C/OL]∥Proceedings of IEEE International Conference on Data Engineering.http://citeseer.ist.psu.edu/497671.html. 被引量:1
  • 5DATAR M,GIONIS A,INDYK P,et al.Maintaining stream statistics over sliding windows[C]∥Proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA 2002).San Francisc:[s.n.],2002,31:635-644. 被引量:1
  • 6BABCOCK B,DATAR M,MOTWANI R,et al.Maintaining variance and k-medians over data stream windows[C]∥Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.New York:ACM,2003:234-243. 被引量:1
  • 7AGGARWAL C C,HAN Jiawei,WANG Jianyoung,et al.A framework for clustering evolving data streams[C]∥Proceedings of the 29th International Conference on Very large Data Bases.Berlin:VLDB Endowment,2003,29:81-92. 被引量:1
  • 8MOTOYOSHI M,MIURA T,SHIOYA I.Clustering stream data by regression analysis[C]∥Proceedings of 2th Workshop on Australasian Information Security,Data Mining and Web Intelligence,and Software Internationalisation.Dunedin:Australian Computer Society,Inc.,2004,32:115-120. 被引量:1
  • 9PARK N H,LEE W S.Statistical grid-based clustering over data streams[B/OL]∥[2008-01-16].http://portal.acm.org/citation.cfm?doid=974121.974127t. 被引量:1
  • 10SONG Mingzhou,WANG Hongbin.Highly efficient incremental estimation of gaussian mixture models for online data stream clustering[C]∥Proceedings of Intelligent Computing:Theory and Applications Ⅲ.Bellingham:SPIE,2005,5803:174-183. 被引量:1

同被引文献40

  • 1王伦文.聚类的粒度分析[J].计算机工程与应用,2006,42(5):29-31. 被引量:19
  • 2Marques S J P.模式识别--原理、方法及应用[M].吴逸飞,译.北京:清华大学出版社,2002. 被引量:2
  • 3HANJ W,KAMBER M.Data mining:concepts and techniques[M].San Francisco:Morgan Kanfmann Publishers,2000:335-391. 被引量:1
  • 4SAMBASIVAM S,THEODOSOPOULOS N.Advanced data clustering methods of mining Web documents[J].Issues in Informing Science and Information Technology,2006(3):563-579. 被引量:1
  • 5CHANG K C,YEH M F.Grey relational analysis based approachfordata clustering[J].1EE Proc.-Vis.Image Signal Process,2005,152(2):165-172. 被引量:1
  • 6YEH M F,CHIANG S S.Grey ART network for data clustering[J].Neuroeomputing,2005,67:313-320. 被引量:1
  • 7ESTER M,KRIEGEL H P,SANDER J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]∥Proc.2nd Int Conf on Knowledge Discovery and Data Mining.Menlo Park:AAAJ Press,1996:226-231. 被引量:1
  • 8DENG J L.Introduction to grey system theory[J].J.Grey System,1989,1(1):1-24. 被引量:1
  • 9GRABMEIER J,RUDOLPH A.Techniques of clustering algnrithims in data mining[J].Data Mining and Knowledge Discovery,2002,6(4):303-360. 被引量:1
  • 10SUN Haojun,WANG Shengrui,JIANG Qingshan.FCM-based model selection algorithm for determining the numberof cluster[J].Pattem Recognition,2004,37(10):2027-2037. 被引量:1

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部