互联网环境下运行的组合服务易受到资源故障和组件失效影响而导致失效.已有的失效恢复措施在提高服务可用性的同时也会对服务的性能产生负面影响.为了对失效可恢复情况下的组合服务性能进行量化,通过综合组合服务失效类型和恢复策略,给...互联网环境下运行的组合服务易受到资源故障和组件失效影响而导致失效.已有的失效恢复措施在提高服务可用性的同时也会对服务的性能产生负面影响.为了对失效可恢复情况下的组合服务性能进行量化,通过综合组合服务失效类型和恢复策略,给出一种考虑失效恢复的组合服务性能分析模型.采用排队Petri网(queueing Petri net,简称QPN)描述组合服务的失效发生及其恢复处理过程,重点研究实施重试和替换策略的服务运行情况.详细描述了考虑失效恢复的服务节点和链路QPN模型的内部结构,在此基础上,通过服务交互机制构建组合服务分散执行的性能模型.最后,采用QPME工具仿真和比较不同失效发生率、失效类型分布和恢复策略下组合服务模型的性能表现.结果表明,该方法能够定量分析失效恢复对组合服务性能的影响,有助于指导不确定网络环境下的信息服务系统失效恢复策略实施方案的设计.展开更多
由于多智能体所处环境动态变化,并且单个智能体的决策也会影响其他智能体,这使得单智能体深度强化学习算法难以在多智能体环境中保持稳定.为了适应多智能体环境,本文利用集中训练和分散执行框架Cen-tralized Training with Decentralize...由于多智能体所处环境动态变化,并且单个智能体的决策也会影响其他智能体,这使得单智能体深度强化学习算法难以在多智能体环境中保持稳定.为了适应多智能体环境,本文利用集中训练和分散执行框架Cen-tralized Training with Decentralized Execution(CTDE),对单智能体深度强化学习算法Soft Actor-Critic(SAC)进行了改进,引入智能体通信机制,构建Multi-Agent Soft Actor-Critic(MASAC)算法. MASAC中智能体共享观察信息和历史经验,有效减少了环境不稳定性对算法造成的影响.最后,本文在协同以及协同竞争混合的任务中,对MASAC算法性能进行了实验分析,结果表明MASAC相对于SAC在多智能体环境中具有更好的稳定性.展开更多
The smart grid utilizes the demand side management technology to motivate energy users towards cutting demand during peak power consumption periods, which greatly improves the operation efficiency of the power grid. H...The smart grid utilizes the demand side management technology to motivate energy users towards cutting demand during peak power consumption periods, which greatly improves the operation efficiency of the power grid. However, as the number of energy users participating in the smart grid continues to increase, the demand side management strategy of individual agent is greatly affected by the dynamic strategies of other agents. In addition, the existing demand side management methods, which need to obtain users’ power consumption information,seriously threaten the users’ privacy. To address the dynamic issue in the multi-microgrid demand side management model, a novel multi-agent reinforcement learning method based on centralized training and decentralized execution paradigm is presented to mitigate the damage of training performance caused by the instability of training experience. In order to protect users’ privacy, we design a neural network with fixed parameters as the encryptor to transform the users’ energy consumption information from low-dimensional to high-dimensional and theoretically prove that the proposed encryptor-based privacy preserving method will not affect the convergence property of the reinforcement learning algorithm. We verify the effectiveness of the proposed demand side management scheme with the real-world energy consumption data of Xi’an, Shaanxi, China. Simulation results show that the proposed method can effectively improve users’ satisfaction while reducing the bill payment compared with traditional reinforcement learning(RL) methods(i.e., deep Q learning(DQN), deep deterministic policy gradient(DDPG),QMIX and multi-agent deep deterministic policy gradient(MADDPG)). The results also demonstrate that the proposed privacy protection scheme can effectively protect users’ privacy while ensuring the performance of the algorithm.展开更多
Decentralized cloud platforms have emerged as a promising paradigm to exploit the idle computing resources across the Internet to catch up with the ever-increasing cloud computing demands.As any user or enterprise can...Decentralized cloud platforms have emerged as a promising paradigm to exploit the idle computing resources across the Internet to catch up with the ever-increasing cloud computing demands.As any user or enterprise can be the cloud provider in the decentralized cloud,the performance assessment of the heterogeneous computing resources is of vital significance.However,with the consideration of the untrustworthiness of the participants and the lack of unified performance assessment metric,the performance monitoring reliability and the incentive for cloud providers to offer real and stable performance together constitute the computational performance assessment problem in the decentralized cloud.In this paper,we present a robust performance assessment solution RODE to solve this problem.RODE mainly consists of a performance monitoring mechanism and an assessment of the claimed performance(AoCP)mechanism.The performance monitoring mechanism first generates reliable and verifiable performance monitoring results for the workloads executed by untrusted cloud providers.Based on the performance monitoring results,the AoCP mechanism forms a unified performance assessment metric to incentivize cloud providers to offer performance as claimed.Via extensive experiments,we show RODE can accurately monitor the performance of cloud providers on the premise of reliability,and incentivize cloud providers to honestly present the performance information and maintain the performance stability.展开更多
为提高综合能源系统自动发电控制(Automatic Generation Control,AGC)的控制性能和算法收敛速度,本文提出了一种基于多智能体迁移柔性行动器-批判器与长短时记忆网络(Multi-Agent Transfer Soft Actor-Critic with Long-Short Term Memo...为提高综合能源系统自动发电控制(Automatic Generation Control,AGC)的控制性能和算法收敛速度,本文提出了一种基于多智能体迁移柔性行动器-批判器与长短时记忆网络(Multi-Agent Transfer Soft Actor-Critic with Long-Short Term Memory,MATSAC-LSTM)的AGC控制法。首先,用LSTM网络将采集的区域控制误差等环境状态量进行时序特征提取,并作为MATSAC算法的输入,使智能体能结合历史信息进行快速的有功功率分配决策;其次,采用集中训练分散执行框架,将一个智能体观察的环境状态量以及其他智能体的动作信息作为相应智能体Critic网络的输入,以便训练时能够让多智能体之间共享信息;最后,通过迁移学习将旧任务训练的Critic和Actor网络模型参数转移到新任务相应模型参数中,以提高智能体的训练效率。算例分析在一个修改的IEEE标准两区域负荷频率控制系统模型和一个五区域综合能源系统模型展开,仿真结果表明,与比例积分微分、Q学习、双延迟深度确定性策略梯度、基于动态策略的赢或快速学习爬坡策略、柔性行动器-批判器等传统算法相比,本文所提MATSAC-LSTM算法提高了AGC控制性能标准和算法收敛速度,降低了系统的区域控制误差和频率偏差。展开更多
文摘互联网环境下运行的组合服务易受到资源故障和组件失效影响而导致失效.已有的失效恢复措施在提高服务可用性的同时也会对服务的性能产生负面影响.为了对失效可恢复情况下的组合服务性能进行量化,通过综合组合服务失效类型和恢复策略,给出一种考虑失效恢复的组合服务性能分析模型.采用排队Petri网(queueing Petri net,简称QPN)描述组合服务的失效发生及其恢复处理过程,重点研究实施重试和替换策略的服务运行情况.详细描述了考虑失效恢复的服务节点和链路QPN模型的内部结构,在此基础上,通过服务交互机制构建组合服务分散执行的性能模型.最后,采用QPME工具仿真和比较不同失效发生率、失效类型分布和恢复策略下组合服务模型的性能表现.结果表明,该方法能够定量分析失效恢复对组合服务性能的影响,有助于指导不确定网络环境下的信息服务系统失效恢复策略实施方案的设计.
文摘由于多智能体所处环境动态变化,并且单个智能体的决策也会影响其他智能体,这使得单智能体深度强化学习算法难以在多智能体环境中保持稳定.为了适应多智能体环境,本文利用集中训练和分散执行框架Cen-tralized Training with Decentralized Execution(CTDE),对单智能体深度强化学习算法Soft Actor-Critic(SAC)进行了改进,引入智能体通信机制,构建Multi-Agent Soft Actor-Critic(MASAC)算法. MASAC中智能体共享观察信息和历史经验,有效减少了环境不稳定性对算法造成的影响.最后,本文在协同以及协同竞争混合的任务中,对MASAC算法性能进行了实验分析,结果表明MASAC相对于SAC在多智能体环境中具有更好的稳定性.
基金supported in part by the National Science Foundation of China (61973247, 61673315, 62173268)the Key Research and Development Program of Shaanxi (2022GY-033)+2 种基金the Nationa Postdoctoral Innovative Talents Support Program of China (BX20200272)the Key Program of the National Natural Science Foundation of China (61833015)the Fundamental Research Funds for the Central Universities (xzy022021050)。
文摘The smart grid utilizes the demand side management technology to motivate energy users towards cutting demand during peak power consumption periods, which greatly improves the operation efficiency of the power grid. However, as the number of energy users participating in the smart grid continues to increase, the demand side management strategy of individual agent is greatly affected by the dynamic strategies of other agents. In addition, the existing demand side management methods, which need to obtain users’ power consumption information,seriously threaten the users’ privacy. To address the dynamic issue in the multi-microgrid demand side management model, a novel multi-agent reinforcement learning method based on centralized training and decentralized execution paradigm is presented to mitigate the damage of training performance caused by the instability of training experience. In order to protect users’ privacy, we design a neural network with fixed parameters as the encryptor to transform the users’ energy consumption information from low-dimensional to high-dimensional and theoretically prove that the proposed encryptor-based privacy preserving method will not affect the convergence property of the reinforcement learning algorithm. We verify the effectiveness of the proposed demand side management scheme with the real-world energy consumption data of Xi’an, Shaanxi, China. Simulation results show that the proposed method can effectively improve users’ satisfaction while reducing the bill payment compared with traditional reinforcement learning(RL) methods(i.e., deep Q learning(DQN), deep deterministic policy gradient(DDPG),QMIX and multi-agent deep deterministic policy gradient(MADDPG)). The results also demonstrate that the proposed privacy protection scheme can effectively protect users’ privacy while ensuring the performance of the algorithm.
基金This work is supported by the National Natural Science Foundation of China under Grant Nos.61832006 and 61872240。
文摘Decentralized cloud platforms have emerged as a promising paradigm to exploit the idle computing resources across the Internet to catch up with the ever-increasing cloud computing demands.As any user or enterprise can be the cloud provider in the decentralized cloud,the performance assessment of the heterogeneous computing resources is of vital significance.However,with the consideration of the untrustworthiness of the participants and the lack of unified performance assessment metric,the performance monitoring reliability and the incentive for cloud providers to offer real and stable performance together constitute the computational performance assessment problem in the decentralized cloud.In this paper,we present a robust performance assessment solution RODE to solve this problem.RODE mainly consists of a performance monitoring mechanism and an assessment of the claimed performance(AoCP)mechanism.The performance monitoring mechanism first generates reliable and verifiable performance monitoring results for the workloads executed by untrusted cloud providers.Based on the performance monitoring results,the AoCP mechanism forms a unified performance assessment metric to incentivize cloud providers to offer performance as claimed.Via extensive experiments,we show RODE can accurately monitor the performance of cloud providers on the premise of reliability,and incentivize cloud providers to honestly present the performance information and maintain the performance stability.
文摘为提高综合能源系统自动发电控制(Automatic Generation Control,AGC)的控制性能和算法收敛速度,本文提出了一种基于多智能体迁移柔性行动器-批判器与长短时记忆网络(Multi-Agent Transfer Soft Actor-Critic with Long-Short Term Memory,MATSAC-LSTM)的AGC控制法。首先,用LSTM网络将采集的区域控制误差等环境状态量进行时序特征提取,并作为MATSAC算法的输入,使智能体能结合历史信息进行快速的有功功率分配决策;其次,采用集中训练分散执行框架,将一个智能体观察的环境状态量以及其他智能体的动作信息作为相应智能体Critic网络的输入,以便训练时能够让多智能体之间共享信息;最后,通过迁移学习将旧任务训练的Critic和Actor网络模型参数转移到新任务相应模型参数中,以提高智能体的训练效率。算例分析在一个修改的IEEE标准两区域负荷频率控制系统模型和一个五区域综合能源系统模型展开,仿真结果表明,与比例积分微分、Q学习、双延迟深度确定性策略梯度、基于动态策略的赢或快速学习爬坡策略、柔性行动器-批判器等传统算法相比,本文所提MATSAC-LSTM算法提高了AGC控制性能标准和算法收敛速度,降低了系统的区域控制误差和频率偏差。