期刊文献+

基于MapReduce的分布式网络数据聚类算法 被引量:9

Distributed Clustering Algorithm for Network Data Based on MapReduce
下载PDF
导出
摘要 时空复杂度较高以及物理机器内存不足,会导致传统聚类算法不能有效地分析处理大规模数据网络。针对该问题,在MapReduce分布式模型的基础上,提出一种网络数据分布式聚类算法。根据MRC理论设计有限MapReduce轮数,控制混洗过程所需时间,利用Map内合并技术对网络流量进行控制,在进行中间结果合并时仅对社团合并,而不考虑社团内部节点,以控制内存开销。使用模拟生成的数据在集群中进行实验,结果表明,当数据规模和集群规模增大时,该算法具有较好的加速比和扩展性。 Due to the high time and space complexity and physical machines out of memory, traditional clustering algorithms usually can not effectively analyze and deal with large data network. To solve this problem, this paper proposes a distributed clustering algorithm for network data based on MapReduce model. It adopts the theory of MRC theory to design limited round number of MapReduce to control the time in shuffle stage, and utilizes the Map inner merging technology to control network flow. It proposes an idea that if merge the intermediate results, only merge clusters and do not consider the internal nodes, which can control memory overhead. It utilizes the data sets generated by simulation to do experiment. Experimental results show that when the data size and cluster scale increases, the CAMR algorithm has good speedup ratio and scalability.
出处 《计算机工程》 CAS CSCD 2013年第7期76-82,共7页 Computer Engineering
基金 辽宁省自然科学基金资助项目(20102059)
关键词 聚类算法 分布式聚类 MapReduce编程模型 数据挖掘 社团结构 clustering algorithm distributed clustering MapReduce programming model data mining community structure
  • 相关文献

参考文献12

  • 1Xu Xiaowei, Yuruk N, Feng Zhidan. SCAN: A Structural Clustering Algorithm for Networks[C]//Proc. of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2007. 被引量:1
  • 2Johnson H O, Teter A J. News and Announcements Changes: Share Your News Online with the Chemical Education Com- munity[J]. Journal of Chemical Education, 2012, 89(1): 12. 被引量:1
  • 3Wooldridge J M. Cluster-sample Methods in Applied Econo- metrics[C]//Proc. of the American Economic Review. Washington D. C., USA: American Economic Association, 2003: 133-138. 被引量:1
  • 4Younis O, Fahmy S. HEED: A Hybrid, Energy-efficient, Distri- buted Clustering Approach for Ad Hoc Sensor Networks[J]. IEEE Transactions on Mobile Computing, 2004, 3(4): 366- 379. 被引量:1
  • 5Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Cluster[J]. Communications of the ACM, 2005, 51(1): 107-113. 被引量:1
  • 6Lee K, Lee Y, Choi H. Parallel Data Processing with Map- Reduce: A Survey[J]. ACM SIGMOD Record, 2011, 40(4): 11-20. 被引量:1
  • 7Han Jiawei,Kamber M.数据挖掘概念与技术[M].范明,孟小峰,译.2版.北京:机械工业出版社,2007. 被引量:5
  • 8Newman M E J. Networks: An Introduction[M]. Oxford, UK: Oxford University Press, 2010. 被引量:1
  • 9Ambrosini E, Aloisi F. Chemokines and Glial Cells: A Complex Network in the Central Nervous System[J]. Neurochemical Research, 2004, 29(5): 1017-1038. 被引量:1
  • 10Girvan M, Newman M E J. Community Structure in Social and Biological Networks[J]. Proceedings of the National Academy of Sciences, 2002, 99(12): 7821-7826. 被引量:1

共引文献4

同被引文献93

引证文献9

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部