期刊文献+

多代表点特征树与空间聚类算法 被引量:5

Multi-representation Feature Tree and Spatial Clustering Algorithm
下载PDF
导出
摘要 空间数据具有海量、复杂、连续、空间自相关、存在缺损与误差等的特点,要求空间聚类算法具有高效率,能处理各种复杂形状的簇,聚类结果与数据空间分布顺序无关,并且对离群点是健壮的等性能,已有的算法难以同时满足要求。本文提出了一个适合处理海量复杂空间数据的数据结构-多代表点特征树。基于多代表点特征树提出了适合挖掘海量复杂空间数据聚类算法CAMFT,该算法利用多代表点特征树对海量的数据进行压缩,结合随机采样的方法进一步增强算法处理海量数据的能力;同时,多代表点特征树能够保存复杂形状的聚类特征,适合处理复杂空间数据。实验表明了算法CAMFT能够快速处理带有离群点的复杂形状聚类的空间数据,结果与对象空间分布顺序无关,并且效率优于已有的同类聚类算法BIRCH与CURE。 Spatial data have the features of largeness, complexity, continuity, spatial autocorrelation, missing data and error in spatial database. These characters require that a good spatial clustering algorithm must be high efficient, and should be able to detect clusters of complicated shapes, and the dusters found should be independent of the order in which the points in the space are examined, and should be not be impacted by outliers. The existed algorithms can not work well, Clustering algorithm based on multi-representation feature tree named CAMFT is proposed, A new data structure is firstly proposed to condense data, which drew the strongpoint from BIRCH algorithm and CURE algorithm, and then the algorithm that included the idea of random sampling is proposed to enhance the ability to detect very large data, As well as, the multi-representation feature tree can keep clusters of complicated shapes, so it can be used to detect spatial clusters. Experimental results show the algorithm can identify clusters of complicated shapes efficiently in large spatial database that have many outliers, and outperform BIRCH algorithm and CURE algorithm in efficiency.
出处 《计算机科学》 CSCD 北大核心 2006年第12期189-195,共7页 Computer Science
基金 国家自然科学基金(No.49971063) 国家高技术研究发展计划(863)(No.2001AA6330101-04) 航空科学基金项目(02F52033) 江苏省自然科学基金(No.BK2001045)。
关键词 空间聚类 空间数据 多代表点特征树 Spatial clustering, Spatial data, Multi representation feature tree
  • 相关文献

参考文献33

  • 1Dunham MH. Data mining introductory and advanced topics. Upper ,Saddle River, N.J. : Prentice Hall/Pearson Education, 2003.221-243 被引量:1
  • 2MacQueen J. Some methods for classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symposium in Mathematics. Univ. of California, Berkeley, USA: Statistics and Probability, 1967 被引量:1
  • 3Lauritzen SL. The EM algorithm for graphical association model with missing data. Computational Statistics and Data Analysis,1995, 19(2):191-201 被引量:1
  • 4Kaufman L, Rousseeuw PJ. Finding Groups in Data: An Introduction to cluster Analysis. New York: John Wiley & Sons,1990 被引量:1
  • 5Zhang T, Ramakrishnan R, Livny M. BIRCH, an efficient data clustering method for very large databases. In:Proceedings of the International Conference Management of Data (ACM-SIGMOD).Montreal, Canada, 1996. 103-114 被引量:1
  • 6Guha S, Rastogi R, Shim k. CURE.. An Efficient Clustering Algorithm for Large Databases. In:Proc, 1998 ACM-SIGMOD Int.Conf, Management of Data (SIGMOD'98). Seattle , Washington, 1998. 73-84 被引量:1
  • 7Guha S, Rastogi R, Shim k. Rock: A robust clustering algorithm for categorical attributes. In:Proc. 1999 Int. Conf. Data Engineering (ICDE'99). Sydney, Australia, 1999. 512-521 被引量:1
  • 8Ester M, Kriegel H-P, Sander J, Xu X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In:Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. Portland, OR, 1996. 226-231 被引量:1
  • 9Ankerst M, Breunig M, Kriegel HP, Sander J. OPTICS: Ordering points to identify the clustering structure. In: Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (DIGMOD'99).Philadelphia, PA, 1999. 49-60 被引量:1
  • 10Hinneburg A, Keim DA. An efficient approach to clustering in large multimedia databases with noise, In:Proc. 1998 Int. Conf.Knowledge Discovery and Data Mining (KDD'98). New York,1998 被引量:1

二级参考文献8

  • 1Han JW, Kambr M. Data Mining Concepts and Techniques. Beijing: Higher Education Press, 2001. 145-176. 被引量:1
  • 2Kaufan L, Rousseeuw PJ. Finding Groups in Data: an Introduction to Cluster Analysis. New York: John Wiley & Sons, 1990. 被引量:1
  • 3Ester M, Kriegel HP, Sander J, Xu X. A density based algorithm for discovering clusters in large spatial databases with noise. In:Simoudis E, Han JW, Fayyad UM, eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.Portland: AAAI Press, 1996. 226-231. 被引量:1
  • 4Guha S, Rastogi R, Shim K. CURE: an efficient clustering algorithm for large databases. In: Haas LM, Tiwary A, eds. Proceedings of the ACM SIGMOD International Conference on Management of Data. Seattle: ACM Press, 1998. "73-84. 被引量:1
  • 5Agrawal R, Gehrke J, Gunopolos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining application. In: Haas LM, Tiwary A, eds. Proceedings of the ACM SIGMOD International Conference on Management of Data.Seattle: ACM Press, 1998.94-105. 被引量:1
  • 6Alexandros N, Yannis T,Yannis M. C^2P: clustering based on closest pairs. In: Apers PMG, Atzeni P, Ceri S, Paraboschi S,Ramamohanarao K, Snodgrass RT, eds. Proceedings of the 27th International Conference on Very Large Data Bases. Roma:Morgan Kaufmann Publishers, 2001. 331-340. 被引量:1
  • 7Berchtold S, Bohm C, Kriegel H-P. The pyramid-technique: towards breaking the curse of dimensionality. In: Haas LM, Tiwary A,eds. Proceedings of the ACM SIGMOD International Conference on Management of Data. Seattle: ACM Press, 1998. 142- 153. 被引量:1
  • 8Yu C, Ooi BC, Tan K-L, Jagadish HV. Indexing the distance: an efficient method to KNN processing. In: Apers PMG, Atzeni P,Ceri S, Paraboschi S, Ramamohanarao K, Snodgrass RT, eds. Proceedings of the 27th International Conference on Very Large Data Bases. Roma: Morgan Kaufmann Publishers, 2001. 421--430. 被引量:1

共引文献297

同被引文献42

引证文献5

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部