摘要
算力网络需要在满足用户业务需求的基础上最大化系统性能指标,现有方法主要通过多目标加权进行转换和求解,存在超参数难以确定、跨场景适用性差等问题。在分析算网目标特性的基础上,基于策略约束强化学习,将业务需求作为约束、系统性能指标作为优化目标,通过价值—策略—超参数的多级迭代策略,实现算网对用户业务需求的期望确定性保障和对系统性能的最优化。同时,研究了针对超参数寻优的多尺度步长(multi-scale step length,MSL)方法,进一步提升了系统的稳定性和准确性。仿真结果表明,所提方法在系统架构和负载变化情况下均具有良好的收敛性和稳定性。
The computing power network needs to maximize the system performance index on the basis of meeting user business needs,and the existing methods are mainly based on the multi-objective weighting method,which has problems such as difficult to determine hyperparameters and poor cross-scenario applicability.Based on this,based on the analysis of the characteristics of the computing power network target,the user business requirements were taken as the policy constraints,and the performance indicators of the computing power network was taken as the op-timization goal based on constrained policy optimization,and the expectation certainty of user business needs and the optimization of system performance through the value-strategy-hyper-parameter multi-level iterative strategy was realized.At the same time,the multi-scale step length(MSL)method for hyper-parameter optimization was studied,which further improved the stability and accuracy of the system.Simulation results show that the proposed method has good convergence and stability under the conditions of single terminal-single edge server,mul-ti-terminal-multi-edge server and system load change.
作者
沈林江
曹畅
崔超
张岩
SHEN Linjiang;CAO Chang;CUI Chao;ZHANG Yan(Inspur Communication Information System Co.,Ltd.,Jinan 250100,China;Research Institute of China United Network Communications Co.,Ltd.,Beijing 100048,China)
出处
《电信科学》
2023年第8期136-148,共13页
Telecommunications Science
关键词
算力网络
多目标优化
强化学习
computing power network
multi-objective optimization
reinforcement learning