摘要
针对数据并行方法加速大规模深度神经网络时易出现的通信开销大、训练耗时长、资源利用率不高的问题,提出了一种深度神经网络动态分层梯度稀疏化及梯度合并优化方法。首先,将梯度稀疏化压缩与流水线并行技术相结合,提出动态分层梯度稀疏优化方法,为每层神经网络匹配一个合适的阈值,通过在后续迭代时动态调整该阈值,实现对每层网络传输梯度的自适应压缩。然后,提出了层梯度合并方法,利用动态规划算法对层梯度合并时的通信开销、稀疏化及层梯度计算时间进行权衡优化,求解出最佳的层梯度合并组合,并将多层小尺度梯度张量合并为一层通信,以降低分层梯度决策时引入的过高通信延迟开销。最后,将求解出的最佳层梯度合并组合应用于具体的训练迭代过程。实验结果表明:与已有方法相比,所提方法可在保证模型训练精度的同时大大降低通信开销,提升模型的训练速度;与未压缩方法相比,训练速度最大可提升1.99倍。
A dynamic layer-wise gradient sparsity and gradient aggregation optimization strategy for deep neural networks is proposed to address the challenges posed by substantial communication overhead,prolonged training duration,and suboptimal resource utilization associated with the acceleration of large-scale deep neural networks through data parallelism.Initially,a dynamic layer-wise gradient sparsity optimization method is proposed by combining gradient sparsity compression with pipeline parallelism.Each neural network layer is assigned an appropriate threshold,which is adjusted dynamically in subsequent iterations to achieve adaptive compression of gradient transmission for each layer.Subsequently,a layer-wise gradient merging method is introduced.Leveraging dynamic programming,this method optimizes communication overhead,sparsity,and layer gradient computation time during layer-wise gradient merging,determining the optimal combination for merging multiple layers of small-scale gradient tensors into a single communication layer.This aims to reduce the high communication latency introduced during layer-wise gradient decision-making.Finally,the determined optimal layer-wise gradient merging combination is applied to the specific training iteration process.Experimental results demonstrate that the proposed method,compared to existing methods,significantly reduces communication overhead and enhances model training speed while ensuring model training accuracy.It achieves a maximum training speed up of 1.99 times compared to the uncompressed method.
作者
巨涛
康贺廷
刘帅
火久元
JU Tao;KANG Heting;LIU Shuai;HUO Jiuyuan(School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China)
出处
《西安交通大学学报》
EI
CAS
CSCD
北大核心
2024年第9期105-116,共12页
Journal of Xi'an Jiaotong University
基金
国家自然科学基金资助项目(61862037,62262038)
甘肃省科技计划资助项目(23CXGA0028)。
关键词
深度神经网络
分布式训练
同步数据并行
梯度压缩
层梯度合并
deep neural network
distributed training
synchronous data parallelism
gradient compression
layer gradient merging