期刊文献+

离散粒子群优化算法实现MapReduce负载平衡 被引量:1

Discrete Particle Swarm Optimization Algorithm for MapReduce Load Balance
下载PDF
导出
摘要 MapReduce是Hadoop的核心模型之一,广泛应用于大数据处理。MapReduce模型将计算分为Map和Reduce两个处理阶段。但由于其自身的分区机制,导致在Reduce阶段处理数据时,会出现负载不平衡的数据倾斜问题。为了解决数据倾斜问题,提出利用离散粒子群算法解决Reduce阶段数据负载平衡问题。将数据分区策略与粒子群算法相结合,提高系统的稳定性。通过设置使数据分区均衡的目标函数,利用离散粒子群算法求解目标函数。试验结果证明,当设置不同数量的Reduce时,离散粒子群分区方式的运行时间均为最短,可有效解决数据分区的不平衡问题,并大大提升系统的计算效率。 MapReduce is one of the core models of Hadoop,and is widely used in big data processing.The MapReduce model divides the computation into two stages:Map and Reduce.However,due to its own partition mechanism,the problem of load unbalanced data skew occurs when data is processed in the Reduce phase.In order to solve the problem of data skew,discrete particle swarm optimization algorithm is proposed to resolve data load balancing of Reduce phase.By combining the data partitioning strategy with particle swarm optimization algorithm,the stability of the system is improved.By setting the target function of data partition equilibrium,the discrete particle swarm algorithm is used to solve the target function.The experimental results show that when different number of reduce are set,the running time of discrete particle swarm partition way is the shortest,which effectively solve the unbalance of data partition,and greatly improve the computational efficiency of the system.
作者 李安颖 陈群 宋荷 LI Anying;CHEN Qun;SONG He(School of Computer Science and Engineering,Northwestern Polytechnical University,Xi’an 710072,China)
出处 《自动化仪表》 CAS 2018年第12期56-59,共4页 Process Automation Instrumentation
关键词 分布式计算 离散粒子群优化算法 数据倾斜 数据平衡 分区 Distributed calculation Discrete particle swarm optimization algorithm Data skew Data balance Partition
  • 相关文献

参考文献3

二级参考文献40

  • 1周家帅,王琦,高军.一种基于动态划分的MapReduce负载均衡方法[J].计算机研究与发展,2013,50(S1):369-377. 被引量:11
  • 2MANYIKA J, CHUI M, BROWN B, et al. Big data:The next frontier for innovation, competition, and productivity [J]. Communications of the ACM, 2011,56 ( 2 ) : 100 - 105. 被引量:1
  • 3SHVACHKO K, KUANG H, RADIA S, et al. The ha- doop distributed file system [ C]//Mass Storage Systems and Technologies (MSST) , 2010 IEEE 26th Symposium on. IEEE, 2010:1 - 10. 被引量:1
  • 4Capacity Scheduler for Hadoop [ EB/OL]. http://ha- doop. apache, org/docs/current/hadoop - yarn/hadoop - yam - site/CapacityScheduler, html, 2014 -09 -05. 被引量:1
  • 5Fair Scheduler for Hadoop [ EB/OL]. http ://hadoop. a- pache, org/docs/current/hadoop - yarn/hadoop - yarn - site/FairScheduler, html. 2014 - 09 - 05. 被引量:1
  • 6ChuckLam.Hadoop实战[M].北京:人民邮电出版社,2012. 被引量:2
  • 7Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Operating Systems Design : Implementation, 2004, 51(1) : 147-152. 被引量:1
  • 8Shvachko K, Kuang H, Radia S, et al. The hadoop distributed file system//Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). Nevada, USA, 2010:1-10. 被引量:1
  • 9Rasmussen A, Conley M, Kapoor R, et at. Themis: An I/O efficient MapReduce//Proceedings of the ACM Symposium on Cloud Computing (SOCC'12). San Jose, USA, 2012. 被引量:1
  • 10Ren K, Kwon Y, Balazinska M, Howe B. Hadoop's adolescence: A comparative workload analysis from three research clusters. Carnegie Mellon University (CMU), USA: Technical Report CMU-PDL-12-106, 2012. 被引量:1

共引文献31

同被引文献5

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部