期刊文献+

一种基于密度的空间数据流在线聚类算法 被引量:28

An On-line Density-based Clustering Algorithm for Spatial Data Stream
下载PDF
导出
摘要 为了解决空间数据流中任意形状簇的聚类问题,提出了一种基于密度的空间数据流在线聚类算法(On-line density-based clustering algorithm for spatial data stream,OLDStream),该算法在先前聚类结果上聚类增量空间数据,仅对新增空间点及其满足核心点条件的邻域数据做局部聚类更新,降低聚类更新的时间复杂度,实现对空间数据流的在线聚类.OLDStream算法具有快速处理大规模空间数据流、实时获取全局任意形状的聚类簇结果、对数据流的输入顺序不敏感、并能发现孤立点数据等优势.在真实数据和合成数据上的综合实验验证了算法的聚类效果、高效率性和较高的可伸缩性,同时实验结果的统计分析显示仅有4%的空间点消耗最坏运行时间,对每个空间点的平均聚类时间约为0.033ms. We propose an efficient online density-based clustering algorithm(On-line density-based clustering algorithm for spatial data stream,OLDStream),which is designed for online discovering clusters in spatial data stream.In OLDStream,only the new spatial point and its adjunct points which satisfy core point are processed in clustering update.And the overall clusters results can be accessed instantaneously.The developed algorithm has exhibited many advantages such as its high scalability to online process incremental large-scale spatial data,its capability to discover overall clusters with arbitrary shape instantaneously,its insensitivity to the input sequence of data stream,and its capability to detect all isolated points.An experimental evaluation of the effectiveness,efficiency and scalability of our algorithm was performed by using real data and large synthetic data from Matlab and Thomas Brinkhoff s network-based generator.Experimental results vividly demonstrated that our algorithm can fast and efficiently cluster new points based on the previous points.The statistics of the results showed that only 4% of the points take the worst case running time,and the average running time is about 0.033 ms for each point process.
出处 《自动化学报》 EI CSCD 北大核心 2012年第6期1051-1059,共9页 Acta Automatica Sinica
基金 国家高技术研究发展计划(863计划)(2011AA040101) 国家自然科学基金(61172049 61003251) 教育部博士点基金(20100006110015)资助~~
关键词 空间数据挖掘 聚类数据流 基于密度的聚类 在线算法 噪声处理 Spatial data mining clustering data stream density-based clustering on-line algorithm handling noise
  • 相关文献

参考文献21

  • 1陆锋 段滢滢 袁文.LBS的数据处理技术[J].中国计算机学会通讯,2010,. 被引量:1
  • 2Guha S, Meyerson A, Mishra N, Motwani R, O'Callaghan L. Clustering data streams: theory and practice. IEEE Trans-actions on Knowledge and Data Engineering, 2003, 15(3): 515-528. 被引量:1
  • 3Han J W, Kamber M. Data Mining Concepts and Tech- niques. Beijing: China Machine Press, 2006. 196-211. 被引量:1
  • 4Ester M, Kriegel H P, Sander J, Xu X W. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Confer- ence on Knowledge Discovery and Data Mining. Portland, USA: AAAI Press, 1996. 226-231. 被引量:1
  • 5Sander J, Ester M, Kriegel H P, Xu X W. Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Mining and Knowledge Discov- ery, 1998, 2(2): 169-194. 被引量:1
  • 6Hinneburg A, Keim D A. An efficient approach to clustering in large multimedia databases with noise. In: Proceedings of the 4th International Conference on Knowledge Discov- ery and Data Mining. New York, USA: AAAI Press, 1998. 58-65. 被引量:1
  • 7马帅,王腾蛟,唐世渭,杨冬青,高军.一种基于参考点和密度的快速聚类算法[J].软件学报,2003,14(6):1089-1095. 被引量:108
  • 8陈卓,孟庆春,魏振钢,任丽婕,窦金凤.一种基于网格和密度凝聚点的快速聚类算法[J].哈尔滨工业大学学报,2005,37(12):1654-1657. 被引量:14
  • 9Duan L, Xiong D Y, Lee J, Feng G. A local density based spatial clustering algorithm with noise. In: Proceedings of the 2006 IEEE International Conference on Systems, Man, and Cybernetics. Taipei, China: IEEE, 2006. 4061-4066. 被引量:1
  • 10Ester M, Kriegel H P, Sander J, Wimmer M, Xu X W. In- cremental clustering for mining in a data warehousing en- vironment. In: Proceedings of the 24th Very Large Data Bases (VLDB) Conference. San Francisco, CA, USA: Mor- gan Kaufmann Publishers Inc, 1998. 323-333. 被引量:1

二级参考文献27

  • 1Han JW, Kambr M. Data Mining Concepts and Techniques. Beijing: Higher Education Press, 2001. 145-176. 被引量:1
  • 2Kaufan L, Rousseeuw PJ. Finding Groups in Data: an Introduction to Cluster Analysis. New York: John Wiley & Sons, 1990. 被引量:1
  • 3Ester M, Kriegel HP, Sander J, Xu X. A density based algorithm for discovering clusters in large spatial databases with noise. In:Simoudis E, Han JW, Fayyad UM, eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.Portland: AAAI Press, 1996. 226-231. 被引量:1
  • 4Guha S, Rastogi R, Shim K. CURE: an efficient clustering algorithm for large databases. In: Haas LM, Tiwary A, eds. Proceedings of the ACM SIGMOD International Conference on Management of Data. Seattle: ACM Press, 1998. "73-84. 被引量:1
  • 5Agrawal R, Gehrke J, Gunopolos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining application. In: Haas LM, Tiwary A, eds. Proceedings of the ACM SIGMOD International Conference on Management of Data.Seattle: ACM Press, 1998.94-105. 被引量:1
  • 6Alexandros N, Yannis T,Yannis M. C^2P: clustering based on closest pairs. In: Apers PMG, Atzeni P, Ceri S, Paraboschi S,Ramamohanarao K, Snodgrass RT, eds. Proceedings of the 27th International Conference on Very Large Data Bases. Roma:Morgan Kaufmann Publishers, 2001. 331-340. 被引量:1
  • 7Berchtold S, Bohm C, Kriegel H-P. The pyramid-technique: towards breaking the curse of dimensionality. In: Haas LM, Tiwary A,eds. Proceedings of the ACM SIGMOD International Conference on Management of Data. Seattle: ACM Press, 1998. 142- 153. 被引量:1
  • 8Yu C, Ooi BC, Tan K-L, Jagadish HV. Indexing the distance: an efficient method to KNN processing. In: Apers PMG, Atzeni P,Ceri S, Paraboschi S, Ramamohanarao K, Snodgrass RT, eds. Proceedings of the 27th International Conference on Very Large Data Bases. Roma: Morgan Kaufmann Publishers, 2001. 421--430. 被引量:1
  • 9NG R T, HAN J. Efficient and effective clustering methods for spatial data mining [ A ]. Proc of the 20th VLDB Conf [C]. Santiago: Morgan Kaufmann, 1994. 144-155. 被引量:1
  • 10ZHANG T. BIRCH: An efficient data clustering method for very large databases[ A]. Proc of the ACM S IGMOD Int' l Conf on Management of Data[ C]. Montreal: ACM Press, 1996. 73-84. 被引量:1

共引文献168

同被引文献183

引证文献28

二级引证文献202

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部