期刊文献+

天河三号原型机分布式并行深度神经网络性能评测及调优 被引量:3

Performance evaluation and optimization of distributed and parallel deep neural network on the Tianhe-3 prototype system
下载PDF
导出
摘要 深度神经网络DNN模型是人工神经网络ANN模型的重要分支,是深度学习的基础。近年来,由于计算机算力的提升和高性能计算技术的发展,使得通过增加DNN网络深度和模型复杂度来提高其特征提取和数据拟合的能力成为可能,从而使DNN在自然语言处理、自动驾驶和人脸识别等问题上显现了优势。然而海量的数据和复杂的模型大大提高了深度神经网络的训练开销,因此加速其训练过程成为了一项关键任务,其技术范围涵盖从底层电路设计到分布式算法设计等多个方面。国产天河三号原型机峰值速度的设计目标为百亿亿级,巨大的计算能力为DNN训练提供了潜在的契机。针对天河三号原型机ARM架构特点,采用PyTorch框架与MPI技术,针对单个MT-2000+计算节点、单个FT-2000+计算节点,以及通过拓展的多节点集群设计CNN训练策略,并对上述处理器在神经网络分布式训练的性能做出了评测和优化,为进一步提升和改进天河三号原型机在神经网络大规模分布式训练方面的表现提供了实验数据和理论依据。 The Deep Neural Network(DNN)model is an important branch of the Artificial Neural Network(ANN)model and the foundation of deep learning.In recent years,due to the improvement of computer computing power and the development of high-performance computing technology,it has become possible to increase the DNN network depth and the model complexity to improve its feature extraction and data fitting capabilities.As a result,DNN has shown advantages in natural language processing,autonomous driving,face recognition and other issues.However,big data and complex models have greatly increased the training cost of deep neural networks.Therefore,accelerating the training process has become a key task.Its technical scope covers many aspects from the design of the underlying circuit to the design of distributed algorithms.The peak speed of the domestic Tianhe-3 aimed at one quintillion of times,and the huge computing power provides a potential opportunity for DNN training.Based on the characteristics of the ARM architecture of the Tianhe-3 prototype,using the PyTorch framework and MPI technology,this paper conducts a uniquely designed CNN training for a single FT-2000+computing node,a single MT-2000+computing node,and the multi-node cluster expanded through them.The performance of the above-mentioned processors in neural network distributed training has been optimized and evaluated,which provides experimental data and theoretical basis for further improving the performance of the Tianhe-3 prototype system in neural network distributed training.
作者 魏嘉 张兴军 纪泽宇 李靖波 岳莹莹 WEI Jia;ZHANG Xing-jun;JI Ze-yu;LI Jing-bo;YUE Ying-ying(School of Computer Science and Technology,Xi’an Jiaotong University,Xi’an 710127,China)
出处 《计算机工程与科学》 CSCD 北大核心 2021年第5期782-791,共10页 Computer Engineering & Science
基金 国家重点研发计划(2016YFB0200902)。
关键词 天河三号原型机 深度学习 分布式训练 性能评测 数据并行 Tianhe-3 prototype deep learning distributed training performance evaluation data pa-rallelism
  • 相关文献

参考文献4

二级参考文献85

  • 1Airoldi EM. Blei DM, Fienberg SE, Xing EP. Mixed membership stochastic block- models. J Mach Learn Res 2008 ;9:1981-2014. 被引量:1
  • 2Ahmed A, Ho Q, Eisenstein J, Xing EP, Smola AJ, "leo CH. Unified analysis of streaming news. In: Proceedings of the 20th International Conference on World Wide Web: 2011 Mar 28-Apr 1 ; Hyderabad, India; 2011. p. 267-76. 被引量:1
  • 3Zbao B. Xing El). Quasi real-time summarization for consumer videos. In: Pro- ceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR); 2014Jun 23-28; Columbus, OH, USA: 2014. p. 2513-20. 被引量:1
  • 4Lee S, Xing EP. Leveraging input and output structures for joint mapping of epistatic and marginal eQTLs. 8ioinformatics 2012;28(12)5137-46. 被引量:1
  • 5Thrun S, Montemerlo M, Dahlkamp H, Stavens D, Aron A, Diebel J, et al. Stanley: the robot that won the DARPA Grand Challenge.J Field Robot 2006;23(9):661-92. 被引量:1
  • 6Chandola V, Banerjee A, Kumar V. Anomaly detection: a survey. ACM Comput Surv 2009;41 (3): 15:1-15:58. 被引量:1
  • 7Wainwright MJ, Jordan MI. Graphical models, exponential families, and varia- tional inference. Hanover: Now Publishers Inc.; 2008. 被引量:1
  • 8Koller D, Friedman N. Probabilistic graphical models: principles and techniques. Cambridge: MIT Press; 2009. 被引量:1
  • 9Xing EP. Probabilistic graphical models [lnternet]. [cited 2016 Jan 1 ]. Available from: https://www.cs.cmu.edu/~epxing/Class/lO7OS/lecture.htmL. 被引量:1
  • 10Zhu J, Xing EP. Maximum entropy discrimination markov networks. J Mach Learn Res 2009; 10:2531-69. 被引量:1

共引文献21

同被引文献18

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部