期刊文献+

基于节点数据密度的分布式K-means聚类算法研究 被引量:5

Distributed K-means clustering by learning data density in local peer
下载PDF
导出
摘要 P2P(peer-to-peer)网络分布式聚类算法是利用P2P网络上各个节点的计算、存储能力以及网络的带宽,将算法的时间复杂度和空间复杂度平摊到各个节点,使处理和分析海量分布式数据成为可能,从而克服传统基于单个服务器的集中式聚类算法在数据处理能力等方面的限制。提出一种基于节点置信半径的分布式K-means聚类算法,该算法通过计算节点上数据分布的密度,找到同一类数据在节点的稠密和稀疏分布,从而确定聚类置信半径并指导下一步的聚类。实验表明,该算法能够有效地减少迭代次数,节省网络带宽;同时聚类结果也接近集中式聚类算法的结果。 The distributed clustering algorithm over the P2P(peer-to-peer) network can share the time and space complexity equally to each peer with utilizing computing and storage capacitates in them,as well as the bandwidth of the network.It overcomes the limitation of traditional central clustering algorithms in processing distributed data and makes it possible to process and analyze mass distributed data.This paper presented a distributed K-means clustering algorithm based on the confidence radius in local peer.The algorithm calculated the data density in local peer to find the dense and sparse distribution in the same cluster,which was used to deduce the confidence radius to guide the next clustering processing.Experimental results show that the algorithm can effectively reduce the number of iterations and save network bandwidth.Meanwhile,the clustering results in this algorithm are closed to those in the centralized clustering algorithm.
出处 《计算机应用研究》 CSCD 北大核心 2011年第10期3643-3645,3655,共4页 Application Research of Computers
基金 国家自然科学基金资助项目(61005017) 国家科技创新基金资助项目(10C26213200946) 江苏省自然科学基金资助项目(BK2009199) 江苏省高校自然科学基础研究资助项目(10KJB520005) 江苏大学高级人才资助项目(1283000347) 江苏省科技创新资助项目(BC2009265)
关键词 点对点技术 K-MEANS聚类 自适应 置信半径 P2P K-means clustering self-adjustment confidence radius
  • 相关文献

参考文献14

  • 1JAIN A K, MURTY M N, FLYNN P J. Data clustering: a review [J]. ACM Computing Surveys, 1999,31 (3): 264-323. 被引量:1
  • 2郑苗苗,吉根林.DK-Means——分布式聚类算法K-Dmeans的改进[J].计算机研究与发展,2007,44(z2):84-88. 被引量:9
  • 3DHILLON I, MODHA D. A data-clustering algorithm on distributed memory multiprocessors [ C ]. Proc of Workshop on Large-Scale Paral- lel Data Mining. Berlin : Springer, 2000 : 802- 802. 被引量:1
  • 4KRUENGKRAI C, JARUSKULCHAI C. A parallel learning algorithm for text classification[ C ]//Proc of the 8th ACM SIGKDD Internatio- nal Conference on Knowledge Discovery and Data Mining. New York: ACM Press,2002:201-206. 被引量:1
  • 5LOPEZ-de-TERUEL P E, GARCIA J M, ACACIO M. The parallel EM algorithm and its application in computer vision[ EB/OL]. 1999. http ://ditec. urn. es/- jmgarcia/papers/em. pdf. 被引量:1
  • 6FORMAN G, ZHANG Bin. Distributed data clustering can be effi- Cient and exact [ J ]. ACM SIGKDD Explorations Newsletter, 2000,2(2) :34-38. 被引量:1
  • 7EISENHARDT M, MULLER W, HENRICH A. Classifying docu- ments by distributed P2P clustering [ C ]//Proc of Jahrestagung der Gesellschaft fur Informatik. 2003 : 286-291. 被引量:1
  • 8SAMATOVA N F, OSTROUCHOV G, GEIST A, et al. RACHET:an efficient cover-based merging of clustering hierarchies from distri- buted datasets [ J ]. Distributed and Parallel Databases, 2002,11 (2) :157-180. 被引量:1
  • 9PARTHASARATHY S, OGIHARA M. Clustering distributed homo- geneous datasets [ M ]. Berlin : Springer,2000:566- 574. 被引量:1
  • 10KARGUPTA H, HUANG Wei-yun, SIVAKUMAR K, et al. Distribu- ted clustering using collective principal component analysis [ J ]. Knowledge and Information Systems,2001,3 (4) : 422-448. 被引量:1

二级参考文献24

  • 1郑苗苗,吉根林.DK-Means——分布式聚类算法K-Dmeans的改进[J].计算机研究与发展,2007,44(z2):84-88. 被引量:9
  • 2Han Jiawei, Kamber M. Data Mining: Concepts and Techniques [D]. San Francisco: Morgan Kaufmann Publishers, 2000: 232- 233. 被引量:1
  • 3Ester M,Kriegel H P,Sander J,et al. A density based algorithm of discovering clusters in large spatial databases with noise[C]//Proc. the 2nd Int'l Conf. Knowledge Discovery and Data Mining. Portland: AAAI Press, 1996:226-231. 被引量:1
  • 4Zhang Tian, Ramakrishnan R, Livny M. BRICH:An efficient data clustering method for very large database[C]//Proc. ACM SIGMOD Int' 1 Conf. Management of Data. New York: ACM Press, 1996 : 73-84. 被引量:1
  • 5Guha S, Rostogi R, Shim K. CURE.. An efficient clustering algorithm for large databases[C]//Proc. The ACM SIGMOD Int'l Conf. Management of Data Seattle. New York: ACM Press, 1998 : 73-84. 被引量:1
  • 6Wang Wei, et al. STING:A statistical information grid approach to spatial data mining[C]//Proc. 23rd VLDB Conf. San Francisco: Morgan Kaufmann, 1997 : 186-195. 被引量:1
  • 7Kantabutra S,Couch A L. Parallel k-means clustering algorithm on Nows[J]. NECTEC Technical Journal, 1999,1 (1): 243- 247. 被引量:1
  • 8Prodio H, Lawrence H. Scalable clustering :A distributed ap - proach[C]//The IEEE Int'l Conf. on Fuzzy Systems. Budapest, Hungary, 2004. 被引量:1
  • 9Tasoulis D K, Vrahatis M N. Unsupervised distributed cluste - ring[C]//The IASTED Int'l Conf. on the Parallel and Distributed Computing and Networks. Innsbruek,2004. 被引量:1
  • 10Januzaj E, Kriegel H P, Pfeifle M. DBDC: Density based distributed clustering[C]//Proc, of the 9th Int'l Conf. on Extending Database Technology. Berlin: Springer, 2004 : 88-105. 被引量:1

共引文献12

同被引文献46

引证文献5

二级引证文献48

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部