Exploring the expected quantizing scheme with suitable mixed-precision policy is the key to compress deep neural networks(DNNs)in high efficiency and accuracy.This exploration implies heavy workloads for domain expert...Exploring the expected quantizing scheme with suitable mixed-precision policy is the key to compress deep neural networks(DNNs)in high efficiency and accuracy.This exploration implies heavy workloads for domain experts,and an automatic compression method is needed.However,the huge search space of the automatic method introduces plenty of computing budgets that make the automatic process challenging to be applied in real scenarios.In this paper,we propose an end-to-end framework named AutoQNN,for automatically quantizing different layers utilizing different schemes and bitwidths without any human labor.AutoQNN can seek desirable quantizing schemes and mixed-precision policies for mainstream DNN models efficiently by involving three techniques:quantizing scheme search(QSS),quantizing precision learning(QPL),and quantized architecture generation(QAG).QSS introduces five quantizing schemes and defines three new schemes as a candidate set for scheme search,and then uses the Differentiable Neural Architecture Search(DNAS)algorithm to seek the layer-or model-desired scheme from the set.QPL is the first method to learn mixed-precision policies by reparameterizing the bitwidths of quantizing schemes,to the best of our knowledge.QPL optimizes both classification loss and precision loss of DNNs efficiently and obtains the relatively optimal mixed-precision model within limited model size and memory footprint.QAG is designed to convert arbitrary architectures into corresponding quantized ones without manual intervention,to facilitate end-to-end neural network quantization.We have implemented AutoQNN and integrated it into Keras.Extensive experiments demonstrate that AutoQNN can consistently outperform state-of-the-art quantization.For 2-bit weight and activation of AlexNet and ResNet18,AutoQNN can achieve the accuracy results of 59.75%and 68.86%,respectively,and obtain accuracy improvements by up to 1.65%and 1.74%,respectively,compared with state-of-the-art methods.Especially,compared with the full-precision AlexNet and ResN展开更多
Additive Runge-Kutta methods designed for preserving highly accurate solutions in mixed-precision computation were previously proposed and analyzed.These specially designed methods use reduced precision for the implic...Additive Runge-Kutta methods designed for preserving highly accurate solutions in mixed-precision computation were previously proposed and analyzed.These specially designed methods use reduced precision for the implicit computations and full precision for the explicit computations.In this work,we analyze the stability properties of these methods and their sensitivity to the low-precision rounding errors,and demonstrate their performance in terms of accuracy and efficiency.We develop codes in FORTRAN and Julia to solve nonlinear systems of ODEs and PDEs using the mixed-precision additive Runge-Kutta(MP-ARK)methods.The convergence,accuracy,and runtime of these methods are explored.We show that for a given level of accuracy,suitably chosen MP-ARK methods may provide significant reductions in runtime.展开更多
How to effectively evaluate the firing precision of weapon equipment at low cost is one of the core contents of improving the test level of weapon system.A new method to evaluate the firing precision of the MLRS consi...How to effectively evaluate the firing precision of weapon equipment at low cost is one of the core contents of improving the test level of weapon system.A new method to evaluate the firing precision of the MLRS considering the credibility of simulation system based on Bayesian theory is proposed in this paper.First of all,a comprehensive index system for the credibility of the simulation system of the firing precision of the MLRS is constructed combined with the group analytic hierarchy process.A modified method for determining the comprehensive weight of the index is established to improve the rationality of the index weight coefficients.The Bayesian posterior estimation formula of firing precision considering prior information is derived in the form of mixed prior distribution,and the rationality of prior information used in estimation model is discussed quantitatively.With the simulation tests,the different evaluation methods are compared to validate the effectiveness of the proposed method.Finally,the experimental results show that the effectiveness of estimation method for firing precision is improved by more than 25%.展开更多
The quantization algorithm compresses the original network by reducing the numerical bit width of the model,which improves the computation speed. Because different layers have different redundancy and sensitivity to d...The quantization algorithm compresses the original network by reducing the numerical bit width of the model,which improves the computation speed. Because different layers have different redundancy and sensitivity to databit width. Reducing the data bit width will result in a loss of accuracy. Therefore, it is difficult to determinethe optimal bit width for different parts of the network with guaranteed accuracy. Mixed precision quantizationcan effectively reduce the amount of computation while keeping the model accuracy basically unchanged. In thispaper, a hardware-aware mixed precision quantization strategy optimal assignment algorithm adapted to low bitwidth is proposed, and reinforcement learning is used to automatically predict the mixed precision that meets theconstraints of hardware resources. In the state-space design, the standard deviation of weights is used to measurethe distribution difference of data, the execution speed feedback of simulated neural network accelerator inferenceis used as the environment to limit the action space of the agent, and the accuracy of the quantization model afterretraining is used as the reward function to guide the agent to carry out deep reinforcement learning training. Theexperimental results show that the proposed method obtains a suitable model layer-by-layer quantization strategyunder the condition that the computational resources are satisfied, and themodel accuracy is effectively improved.The proposed method has strong intelligence and certain universality and has strong application potential in thefield of mixed precision quantization and embedded neural network model deployment.展开更多
针对自动驾驶决策计算低功耗、低延时、高精度的需求,文章设计一种支持混合精度运算的深度强化学习自动驾驶决策算法的硬件加速器。通过多运算单元重构方式设计乘累加单元(multiply-and-accumulate unit, MAC),支持多种精度模式的计算,...针对自动驾驶决策计算低功耗、低延时、高精度的需求,文章设计一种支持混合精度运算的深度强化学习自动驾驶决策算法的硬件加速器。通过多运算单元重构方式设计乘累加单元(multiply-and-accumulate unit, MAC),支持多种精度模式的计算,提高加速器的灵活性,降低量化模型的部署成本;通过多层次优化数据流,提高复用程度,优化加速器能耗比。在随机潜在演员评论家(stochastic latent actor-critic, SLAC)自动驾驶决策算法上测试该硬件加速器,结果表明:有效算力达到18.3 GOPS,是CPU的10.7倍,GPU的3.3倍;能效比达到2.197 GOPS/W,是CPU的104倍,GPU的28倍。同时提出一种高位数据编码(most significant bit data coding, MSB-DC)方法实现层内混合精度特征图计算,实验结果表明,该方法能以较少的延迟成本有效降低量化所带来的误差。展开更多
基金supported by the China Postdoctoral Science Foundation under Grant No.2022M721707the National Natural Science Foundation of China under Grant Nos.62002175 and 62272248+1 种基金the Special Funding for Excellent Enterprise Technology Correspondent of Tianjin under Grant No.21YDTPJC00380the Open Project Foundation of Information Security Evaluation Center of Civil Aviation,Civil Aviation University of China,under Grant No.ISECCA-202102.
文摘Exploring the expected quantizing scheme with suitable mixed-precision policy is the key to compress deep neural networks(DNNs)in high efficiency and accuracy.This exploration implies heavy workloads for domain experts,and an automatic compression method is needed.However,the huge search space of the automatic method introduces plenty of computing budgets that make the automatic process challenging to be applied in real scenarios.In this paper,we propose an end-to-end framework named AutoQNN,for automatically quantizing different layers utilizing different schemes and bitwidths without any human labor.AutoQNN can seek desirable quantizing schemes and mixed-precision policies for mainstream DNN models efficiently by involving three techniques:quantizing scheme search(QSS),quantizing precision learning(QPL),and quantized architecture generation(QAG).QSS introduces five quantizing schemes and defines three new schemes as a candidate set for scheme search,and then uses the Differentiable Neural Architecture Search(DNAS)algorithm to seek the layer-or model-desired scheme from the set.QPL is the first method to learn mixed-precision policies by reparameterizing the bitwidths of quantizing schemes,to the best of our knowledge.QPL optimizes both classification loss and precision loss of DNNs efficiently and obtains the relatively optimal mixed-precision model within limited model size and memory footprint.QAG is designed to convert arbitrary architectures into corresponding quantized ones without manual intervention,to facilitate end-to-end neural network quantization.We have implemented AutoQNN and integrated it into Keras.Extensive experiments demonstrate that AutoQNN can consistently outperform state-of-the-art quantization.For 2-bit weight and activation of AlexNet and ResNet18,AutoQNN can achieve the accuracy results of 59.75%and 68.86%,respectively,and obtain accuracy improvements by up to 1.65%and 1.74%,respectively,compared with state-of-the-art methods.Especially,compared with the full-precision AlexNet and ResN
基金supported by ONR UMass Dartmouth Marine and UnderSea Technology(MUST)grant N00014-20-1-2849 under the project S31320000049160by DOE grant DE-SC0023164 sub-award RC114586-UMD+2 种基金by AFOSR grants FA9550-18-1-0383 and FA9550-23-1-0037supported by Michigan State University,by AFOSR grants FA9550-19-1-0281 and FA9550-18-1-0383by DOE grant DE-SC0023164.
文摘Additive Runge-Kutta methods designed for preserving highly accurate solutions in mixed-precision computation were previously proposed and analyzed.These specially designed methods use reduced precision for the implicit computations and full precision for the explicit computations.In this work,we analyze the stability properties of these methods and their sensitivity to the low-precision rounding errors,and demonstrate their performance in terms of accuracy and efficiency.We develop codes in FORTRAN and Julia to solve nonlinear systems of ODEs and PDEs using the mixed-precision additive Runge-Kutta(MP-ARK)methods.The convergence,accuracy,and runtime of these methods are explored.We show that for a given level of accuracy,suitably chosen MP-ARK methods may provide significant reductions in runtime.
基金National Natural Science Foundation of China(Grant Nos.11972193 and 92266201)。
文摘How to effectively evaluate the firing precision of weapon equipment at low cost is one of the core contents of improving the test level of weapon system.A new method to evaluate the firing precision of the MLRS considering the credibility of simulation system based on Bayesian theory is proposed in this paper.First of all,a comprehensive index system for the credibility of the simulation system of the firing precision of the MLRS is constructed combined with the group analytic hierarchy process.A modified method for determining the comprehensive weight of the index is established to improve the rationality of the index weight coefficients.The Bayesian posterior estimation formula of firing precision considering prior information is derived in the form of mixed prior distribution,and the rationality of prior information used in estimation model is discussed quantitatively.With the simulation tests,the different evaluation methods are compared to validate the effectiveness of the proposed method.Finally,the experimental results show that the effectiveness of estimation method for firing precision is improved by more than 25%.
文摘The quantization algorithm compresses the original network by reducing the numerical bit width of the model,which improves the computation speed. Because different layers have different redundancy and sensitivity to databit width. Reducing the data bit width will result in a loss of accuracy. Therefore, it is difficult to determinethe optimal bit width for different parts of the network with guaranteed accuracy. Mixed precision quantizationcan effectively reduce the amount of computation while keeping the model accuracy basically unchanged. In thispaper, a hardware-aware mixed precision quantization strategy optimal assignment algorithm adapted to low bitwidth is proposed, and reinforcement learning is used to automatically predict the mixed precision that meets theconstraints of hardware resources. In the state-space design, the standard deviation of weights is used to measurethe distribution difference of data, the execution speed feedback of simulated neural network accelerator inferenceis used as the environment to limit the action space of the agent, and the accuracy of the quantization model afterretraining is used as the reward function to guide the agent to carry out deep reinforcement learning training. Theexperimental results show that the proposed method obtains a suitable model layer-by-layer quantization strategyunder the condition that the computational resources are satisfied, and themodel accuracy is effectively improved.The proposed method has strong intelligence and certain universality and has strong application potential in thefield of mixed precision quantization and embedded neural network model deployment.
文摘针对自动驾驶决策计算低功耗、低延时、高精度的需求,文章设计一种支持混合精度运算的深度强化学习自动驾驶决策算法的硬件加速器。通过多运算单元重构方式设计乘累加单元(multiply-and-accumulate unit, MAC),支持多种精度模式的计算,提高加速器的灵活性,降低量化模型的部署成本;通过多层次优化数据流,提高复用程度,优化加速器能耗比。在随机潜在演员评论家(stochastic latent actor-critic, SLAC)自动驾驶决策算法上测试该硬件加速器,结果表明:有效算力达到18.3 GOPS,是CPU的10.7倍,GPU的3.3倍;能效比达到2.197 GOPS/W,是CPU的104倍,GPU的28倍。同时提出一种高位数据编码(most significant bit data coding, MSB-DC)方法实现层内混合精度特征图计算,实验结果表明,该方法能以较少的延迟成本有效降低量化所带来的误差。