摘要
Storm流处理平台解决了传统的基于Hadoop的批处理系统实时性不高的问题,为多源异构大数据处理提供了高效、快速、实时的数据处理框架。然而Storm平台在任务分配过程中只考虑了不同节点之间可用Slot的排序,并没有充分考虑节点的实际负载情况,从而容易产生负载不均衡的问题。针对以上问题,本文在Storm分布式流处理系统上实现对可用Slot和节点负载情况的加权排序改进Storm调度算法,通过数据结构设计,保证rowkey的随机性和唯一性,确保Region Server的负载平衡;同时通过批量写入的机制,提高Hbase数写入速度,从而提高流数据存储效率。通过与原生Storm系统的对比实验,表明本文算法的改进和机制优化保证了数据的快速写入,提高了集群资源的利用率,改进后的系统在实用性与效率上具有明显的优势。
Compared with Hadoop, Stormhas advantage of real-time data stream processing, which preal-time data processing framework for multi-source heterogeneous data processing. However, the worker assignments in theStorm cluster only consider the sort of available Slot between different nodes, while ignoring the current lonodes, which may fail to meet the command of load balancing when more than one topology rprove the efficiency and achieve load balancing of real-time stream processing, a Storm scheduling algoritlim is proposed which is weighted sorting of available Slot and node load conditions and based on Storm-basedload imbalance. And through designing the data structure reasonably, the paper designs the rowkey in Hbase randomly and even-ly, which can ensure the load balance of the various RegionServer, improve the utilization of cluster resources and increase the speed of data writing greatly. Through the comparison experiment with the original Storm srithim improvement and mechanism optimization ensure the fast writing of data and improve theThe improved system has obvious advantages in practicality and efficiency.
出处
《计算机与现代化》
2017年第12期65-70,76,共7页
Computer and Modernization
基金
中国电子科技集团公司第三十二研究所自立项目(ZQ160006
ZQ160007)
关键词
STORM
流处理
分布式计算
批量处理
负载均衡
Storm
streaming processing
distributed computing
batch processing
load balancing