期刊文献+

数据本地性感知的MapReduce负载均衡策略 被引量:4

Load Balancing Strategy on MapReduce with Locality-aware
下载PDF
导出
摘要 现有针对MapReduce的负载均衡调度的研究均未考虑中间数据的分布特点及网络传输的开销,导致额外的网络传输代价与系统效率的下降。为解决上述问题,提出了一种数据本地性感知的负载均衡策略。充分利用YARN中资源管理的新特性,在Map阶段对内存数据溢写的同时进行统计以获取数据分布,根据数据分布情况及各节点的计算能力进行任务调度,减少网络传输开销的同时尽量保证各节点的负载平衡。此外,通过引入细粒度分区与分区的自适应分裂策略,进一步提高在数据倾斜时调度策略的性能。对比实验结果表明,提出的负载均衡调度策略能有效提升性能,同时较好地降低网络总开销。 Abstract Intermediate data distribution characteristics and network traffic overhead are not considered in any existing research on load balancing strategy on MapReduce, resulting in additional network traffic overhead and decrease of sys- tem efficiency. To solve this problem , this paper presented a locality-aware load balancing strategy. By taking advantage of the new features of resource management brought by YARN, the strategy can obtain the data distribution when the buffered data are written to local disk. The strategy schedules the reduce tasks according to the data distribution along with the processing speed of each node to decrease network overhead while maximizing load balancing of each node. In addition, to further improve the performance of scheduling strategy with data skew, this paper introduced the strategy of fine-grained partitioning and self-adaption fragmentation. The comparative experimental results show that the presented strategy can improve the performance effectively,and reduce the total network traffic overhead.
出处 《计算机科学》 CSCD 北大核心 2015年第10期50-56,共7页 Computer Science
基金 国家自然科学基金项目(61373015,61300052) 国家教育部高等学校博士学科点专项科研基金(20103218110017) 江苏高校优势学科建设工程资助项目(PAPD) 中央高校基本科研业务费专项项目(NP2013307,NZ2013306)资助
关键词 数据本地性 数据倾斜 负载均衡 MapReduce, Data locality, Data skew, Load balance
  • 相关文献

参考文献18

  • 1Dean J, Ghemawat S. MapReduce: simplified data processing onlarge clusters [J], Communications of the ACM, 2008,51(1):107-113. 被引量:1
  • 2Apache Hadoop [EB/OL]. http://hadoop.apache.org,2014. 被引量:1
  • 3Vavilapalli V K.Murthy A C, Douglas C, et al. Apache hadoopyarn: Yet another resource negotiator[C] // Proceedings of the4th annual Symposium on Cloud Computing. ACM,2013. 被引量:1
  • 4Ibrahim S,Jin H,Lu L, et al. Handlii^ partitioning skew in Map-Reduce using LEEN[J]. Peer-to-Peer Networking and Applica-tions,2013,6(4) :409-424. 被引量:1
  • 5Guo L,Sun H,Luo Z. A data distribution aware task schedulingstrategy for mapreduce system[M]//Cloud Computing. SpringerBerlin Heidelberg,2009:694-699. 被引量:1
  • 6Polo J,Carrera D.Becerra Y,et al. Performance-driven task co-scheduling for mapreduce environments [C] // Network Opera-tions and Management Sympo-sium (NOMS),2010 IEEE.IEEE, 2010 :373-380. 被引量:1
  • 7唐一韬,黄晶,肖球.一种基于DAG的MapReduce任务调度算法[J].计算机科学,2014,41(S1):42-46. 被引量:7
  • 8Dhawalia P.Kailasam S’Janakiram D. Chisel: A Resource SavvyApproach for Handling Skew in MapReduce Applications[C] //2013 IEEE Sixth International Conference on Cloud Computing(CLOUD). IEEE,2013:652-660. 被引量:1
  • 9Dewitt D J, Naughton J F, Schneider D A, et al. Practical skew handling in parallel joins[C]//Proceedings of the 18th Interna- tional Conference on Very Large Data Bases. 1992:27-40. 被引量:1
  • 10Poosala V, Ioannidis Y E. Estimation of query-result distributionand its application in parallel-join load balancing[C] // VLDB.1996:448-459. 被引量:1

二级参考文献2

  • 1Jeffrey Dean,Sanjay Ghemawat.MapReduce[J].Communications of the ACM.2008(1) 被引量:9
  • 2Luis M. Vaquero,Luis Rodero-Merino,Juan Caceres,Maik Lindner.A break in the clouds[J].ACM SIGCOMM Computer Communication Review.2008(1) 被引量:2

共引文献6

同被引文献29

引证文献4

二级引证文献36

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部