期刊文献+

基于关键阶段分析的Spark性能预测模型 被引量:2

Performance Prediction Model for Spark Based on Key Stages Analysis
下载PDF
导出
摘要 Spark作为目前大数据处理领域广泛使用的计算平台,合理分配集群资源对Spark作业性能优化有着重要的作用.性能预测是集群资源分配优化的基础和关键,本文正是基于此提出了一种Spark性能预测模型.文中选取作业执行时间作为Spark性能衡量指标,提出了Spark作业关键阶段的概念,通过运行小批量数据集来获取关键阶段的运行时间和作业输入数据量之间关系,从而构建了Spark性能预测模型.实验结果表明该模型较为有效. Spark is widely used as a computing platform for large data processing, reasonable allocation of cluster resources plays an important role in the operation of Spark performance optimization. The performance prediction is the basis and key of cluster resource allocation optimization, thus we put forward a Spark performance prediction model in this paper. This paper selects the job execution time as a measure indicator of Spark performance, and put forward the concept of key Stage of Spark job. Finally, we built the model by analyzing relationships between the key Stages and the amount of input data through running a small quantity of data. The experimental results show that the model is effective
作者 葛庆宝 陶耀东 高岑 田月 孟祥茹 GE Qing-Bao;TAO Yao-Dong;GAO Cen;TIAN Yue;MENG Xiang-Ru(University of Chinese Academy of Sciences, Beijing 100049, China;Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang 110168, China)
出处 《计算机系统应用》 2018年第8期232-236,共5页 Computer Systems & Applications
关键词 SPARK 资源分配 性能预测 关键阶段 Spark resource allocation performance prediction key stages
  • 相关文献

参考文献4

二级参考文献86

  • 1Hailong Yang,Zhongzhi Luan,Wenjun Li,Depei Qian.MapReduce Workload Modeling with Statistical Approach[J].Journal of Grid Computing.2012(2) 被引量:2
  • 2Paul Barham,Boris Dragovic,Keir Fraser,Steven Hand,Tim Harris,Alex Ho,Rolf Neugebauer,Ian Pratt,Andrew Warfield.Xen and the art of virtualization[J].ACM SIGOPS Operating Systems Review.2003(5) 被引量:6
  • 3Rizvandi N B,Taheri J,Moraveji R,et al.On modelling and prediction of total CPU usage for applications in mapreduce environments[].Algorithms and Architectures for Parallel Processing.2012 被引量:1
  • 4Herodotou H,Dong F,Babu S.MapReduce programming and cost-based optimization? Crossingthis chasm with Starfish[].Proceedings of the VLDB Endowment.2011 被引量:1
  • 5Babu S.Towards automatic optimization of MapReduce programs[].Proceedings of thest ACM symposium on Cloud computing.2010 被引量:1
  • 6Intel.Optimizing Hadoop*deployments[]..2010 被引量:1
  • 7Impetus Technologies Inc.Hadoop performance tuning[]..2010 被引量:1
  • 8KA V S,TAN J,GANDHI J,et al.An Analysis of Traces from a Production MapReduce Cluster[].thIEEE/ACM International Conference on ClusterCloud and Grid Computing.2010 被引量:1
  • 9Oracle Corporation.A dynamic instrumentation tool for Java. http://kenai.com/projects/btrace . 2013 被引量:1
  • 10O’’Malley O.TeraByte sort on Apache Hadoop. http://sortbenchmark.org/YahooHadoop.pdf . 2008 被引量:1

共引文献57

同被引文献8

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部