Mobile Edge Computing(MEC)is one of the most promising techniques for next-generation wireless communication systems.In this paper,we study the problem of dynamic caching,computation offloading,and resource allocation...Mobile Edge Computing(MEC)is one of the most promising techniques for next-generation wireless communication systems.In this paper,we study the problem of dynamic caching,computation offloading,and resource allocation in cache-assisted multi-user MEC systems with stochastic task arrivals.There are multiple computationally intensive tasks in the system,and each Mobile User(MU)needs to execute a task either locally or remotely in one or more MEC servers by offloading the task data.Popular tasks can be cached in MEC servers to avoid duplicates in offloading.The cached contents can be either obtained through user offloading,fetched from a remote cloud,or fetched from another MEC server.The objective is to minimize the long-term average of a cost function,which is defined as a weighted sum of energy consumption,delay,and cache contents’fetching costs.The weighting coefficients associated with the different metrics in the objective function can be adjusted to balance the tradeoff among them.The optimum design is performed with respect to four decision parameters:whether to cache a given task,whether to offload a given uncached task,how much transmission power should be used during offloading,and how much MEC resources to be allocated for executing a task.We propose to solve the problems by developing a dynamic scheduling policy based on Deep Reinforcement Learning(DRL)with the Deep Deterministic Policy Gradient(DDPG)method.A new decentralized DDPG algorithm is developed to obtain the optimum designs for multi-cell MEC systems by leveraging on the cooperations among neighboring MEC servers.Simulation results demonstrate that the proposed algorithm outperforms other existing strategies,such as Deep Q-Network(DQN).展开更多
Unmanned Aerial Vehicles(UAVs)play a vital role in military warfare.In a variety of battlefield mission scenarios,UAVs are required to safely fly to designated locations without human intervention.Therefore,finding a ...Unmanned Aerial Vehicles(UAVs)play a vital role in military warfare.In a variety of battlefield mission scenarios,UAVs are required to safely fly to designated locations without human intervention.Therefore,finding a suitable method to solve the UAV Autonomous Motion Planning(AMP)problem can improve the success rate of UAV missions to a certain extent.In recent years,many studies have used Deep Reinforcement Learning(DRL)methods to address the AMP problem and have achieved good results.From the perspective of sampling,this paper designs a sampling method with double-screening,combines it with the Deep Deterministic Policy Gradient(DDPG)algorithm,and proposes the Relevant Experience Learning-DDPG(REL-DDPG)algorithm.The REL-DDPG algorithm uses a Prioritized Experience Replay(PER)mechanism to break the correlation of continuous experiences in the experience pool,finds the experiences most similar to the current state to learn according to the theory in human education,and expands the influence of the learning process on action selection at the current state.All experiments are applied in a complex unknown simulation environment constructed based on the parameters of a real UAV.The training experiments show that REL-DDPG improves the convergence speed and the convergence result compared to the state-of-the-art DDPG algorithm,while the testing experiments show the applicability of the algorithm and investigate the performance under different parameter conditions.展开更多
A novel distributed reinforcement learning(DRL)strategy is proposed in this study to coordinate current sharing and voltage restoration in an islanded DC microgrid.Firstly, a reward function considering both equal pro...A novel distributed reinforcement learning(DRL)strategy is proposed in this study to coordinate current sharing and voltage restoration in an islanded DC microgrid.Firstly, a reward function considering both equal proportional current sharing and cooperative voltage restoration is defined for each local agent. The global reward of the whole DC microgrid which is the sum of the local rewards is regarged as the optimization objective for DRL. Secondly,by using the distributed consensus method, the predefined pinning consensus value that will maximize the global reward is obtained. An adaptive updating method is proposed to ensure stability of the above pinning consensus method under uncertain communication. Finally, the proposed DRL is implemented along with the synchronization seeking process of the pinning reward, to maximize the global reward and achieve an optimal solution for a DC microgrid. Simulation studies with a typical DC microgrid demonstrate that the proposed DRL is computationally efficient and able toprovide an optimal solution even when the communication topology changes.展开更多
Nowadays the rapidly developing artificial intelligence has become a key solution for problems of diverse disciplines,especially those involving big data.Successes in these areas also attract researchers from the comm...Nowadays the rapidly developing artificial intelligence has become a key solution for problems of diverse disciplines,especially those involving big data.Successes in these areas also attract researchers from the community of fluid mechanics,especially in the field of active flow control(AFC).This article surveys recent successful applications of machine learning in AFC,highlights general ideas,and aims at offering a basic outline for those who are interested in this specific topic.In this short review,we focus on two methodologies,i.e.,genetic programming(GP)and deep reinforcement learning(DRL),both having been proven effective,efficient,and robust in certain AFC problems,and outline some future prospects that might shed some light for relevant studies.展开更多
Software-Defined Networking(SDN)adapts logically-centralized control by decoupling control plane from data plane and provides the efficient use of network resources.However,due to the limitation of traditional routing...Software-Defined Networking(SDN)adapts logically-centralized control by decoupling control plane from data plane and provides the efficient use of network resources.However,due to the limitation of traditional routing strategies relying on manual configuration,SDN may suffer from link congestion and inefficient bandwidth allocation among flows,which could degrade network performance significantly.In this paper,we propose EARS,an intelligence-driven experiential network architecture for automatic routing.EARS adapts deep reinforcement learning(DRL)to simulate the human methods of learning experiential knowledge,employs the closed-loop network control mechanism incorporating with network monitoring technologies to realize the interaction with network environment.The proposed EARS can learn to make better control decision from its own experience by interacting with network environment and optimize the network intelligently by adjusting services and resources offered based on network requirements and environmental conditions.Under the network architecture,we design the network utility function with throughput and delay awareness,differentiate flows based on their size characteristics,and design a DDPGbased automatic routing algorithm as DRL decision brain to find the near-optimal paths for mice and elephant flows.To validate the network architecture,we implement it on a real network environment.Extensive simulation results show that EARS significantly improve the network throughput and reduces the average packet delay in comparison with baseline schemes(e.g.OSPF,ECMP).展开更多
In recent years,artificial neural networks(ANNs)and deep learning have become increasingly popular across a wide range of scientific and technical fields,including fluid mechanics.While it will take time to fully gras...In recent years,artificial neural networks(ANNs)and deep learning have become increasingly popular across a wide range of scientific and technical fields,including fluid mechanics.While it will take time to fully grasp the potentialities as well as the limitations of these methods,evidence is starting to accumulate that point to their potential in helping solve problems for which no theoretically optimal solution method is known.This is particularly true in fluid mechanics,where problems involving optimal control and optimal design are involved.Indeed,such problems are famously difficult to solve effectively with traditional methods due to the combination of non linearity,non convexity,and high dimensionality they involve.By contrast,deep reinforcement learning(DRL),a method of optimization based on teaching empirical strategies to an ANN through trial and error,is well adapted to solving such problems.In this short review,we offer an insight into the current state of the art of the use of DRL within fluid mechanics,focusing on control and optimal design problems.展开更多
This study proposes a deep reinforcement learning(DRL)based approach to analyze the optimal power flow(OPF)of distribution networks(DNs)embedded with renewable energy and storage devices.First,the OPF of the DN is for...This study proposes a deep reinforcement learning(DRL)based approach to analyze the optimal power flow(OPF)of distribution networks(DNs)embedded with renewable energy and storage devices.First,the OPF of the DN is formulated as a stochastic nonlinear programming problem.Then,the multi-period nonlinear programming decision problem is formulated as a Markov decision process(MDP),which is composed of multiple single-time-step sub-problems.Subsequently,the state-of-the-art DRL algorithm,i.e.,proximal policy optimization(PPO),is used to solve the MDP sequentially considering the impact on the future.Neural networks are used to extract operation knowledge from historical data offline and provide online decisions according to the real-time state of the DN.The proposed approach fully exploits the historical data and reduces the influence of the prediction error on the optimization results.The proposed real-time control strategy can provide more flexible decisions and achieve better performance than the pre-determined ones.Comparative results demonstrate the effectiveness of the proposed approach.展开更多
Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting...Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting in low rewards and an inability for agents to learn strategies.This paper proposes a deep reinforcement learning(DRL)training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before.The ensemble reward function combines the advantages of two reward functions,which enhances the training effect of agents in long episode.Then,we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation.Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’policy scores of the task.These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems,leading to an improved model training performance.展开更多
With the increasing penetration of renewable energy,power grid operators are observing both fast and large fluctuations in power and voltage profiles on a daily basis.Fast and accurate control actions derived in real ...With the increasing penetration of renewable energy,power grid operators are observing both fast and large fluctuations in power and voltage profiles on a daily basis.Fast and accurate control actions derived in real time are vital to ensure system security and economics.To this end,solving alternating current(AC)optimal power flow(OPF)with operational constraints remains an important yet challenging optimization problem for secure and economic operation of the power grid.This paper adopts a novel method to derive fast OPF solutions using state-of-the-art deep reinforcement learning(DRL)algorithm,which can greatly assist power grid operators in making rapid and effective decisions.The presented method adopts imitation learning to generate initial weights for the neural network(NN),and a proximal policy optimization algorithm to train and test stable and robust artificial intelligence(AI)agents.Training and testing procedures are conducted on the IEEE 14-bus and the Illinois 200-bus systems.The results show the effectiveness of the method with significant potential for assisting power grid operators in real-time operations.展开更多
Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been dev...Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been developed with the use of different algorithmic strategies,evolutionary operators,and constraint-handling techniques.The performance of CMOEAs may be heavily dependent on the operators used,however,it is usually difficult to select suitable operators for the problem at hand.Hence,improving operator selection is promising and necessary for CMOEAs.This work proposes an online operator selection framework assisted by Deep Reinforcement Learning.The dynamics of the population,including convergence,diversity,and feasibility,are regarded as the state;the candidate operators are considered as actions;and the improvement of the population state is treated as the reward.By using a Q-network to learn a policy to estimate the Q-values of all actions,the proposed approach can adaptively select an operator that maximizes the improvement of the population according to the current state and thereby improve the algorithmic performance.The framework is embedded into four popular CMOEAs and assessed on 42 benchmark problems.The experimental results reveal that the proposed Deep Reinforcement Learning-assisted operator selection significantly improves the performance of these CMOEAs and the resulting algorithm obtains better versatility compared to nine state-of-the-art CMOEAs.展开更多
The optimal dispatch methods of integrated energy systems(IESs) currently struggle to address the uncertainties resulting from renewable energy generation and energy demand. Moreover, the increasing intensity of the g...The optimal dispatch methods of integrated energy systems(IESs) currently struggle to address the uncertainties resulting from renewable energy generation and energy demand. Moreover, the increasing intensity of the greenhouse effect renders the reduction of IES carbon emissions a priority. To address these issues, a deep reinforcement learning(DRL)-based method is proposed to optimize the low-carbon economic dispatch model of an electricity-heat-gas IES. In the DRL framework, the optimal dispatch model of the IES is formulated as a Markov decision process(MDP). A reward function based on the reward-penalty ladder-type carbon trading mechanism(RPLT-CTM) is introduced to enable the DRL agents to learn more effective dispatch strategies. Moreover, a distributed proximal policy optimization(DPPO) algorithm, which is a novel policy-based DRL algorithm, is employed to train the DRL agents. The multithreaded architecture enhances the exploration ability of the DRL agents in complex environments. Experimental results illustrate that the proposed DPPO-based IES dispatch method can mitigate carbon emissions and reduce the total economic cost. The RPLT-CTM-based reward function outperforms the CTM-based methods, providing a 4.42% and 6.41% decrease in operating cost and carbon emission, respectively. Furthermore, the superiority and computational efficiency of DPPO compared with other DRL-based methods are demonstrated by a decrease of more than 1.53% and 3.23% in the operating cost and carbon emissions of the IES, respectively.展开更多
Interactive Recommendation(IR)formulates the recommendation as a multi-step decision-making process which can actively utilize the individuals’feedback in multiple steps and optimize the long-term user benefit of rec...Interactive Recommendation(IR)formulates the recommendation as a multi-step decision-making process which can actively utilize the individuals’feedback in multiple steps and optimize the long-term user benefit of recommendation.Deep Reinforcement Learning(DRL)has witnessed great application in IR for ecommerce.However,user cold-start problem impairs the learning process of the DRL-based recommendation scheme.Moreover,most existing DRL-based recommendations ignore user relationships or only consider the single-hop social relationships,which cannot fully utilize the social network.The fact that those schemes can not capture the multiple-hop social relationships among users in IR will result in a sub-optimal recommendation.To address the above issues,this paper proposes a Social Graph Neural network-based interactive Recommendation scheme(SGNR),which is a multiple-hop social relationships enhanced DRL framework.Within this framework,the multiple-hop social relationships among users are extracted from the social network via the graph neural network which can sufficiently take advantage of the social network to provide more personalized recommendations and effectively alleviate the user cold-start problem.The experimental results on two real-world datasets demonstrate that the proposed SGNR outperforms other state-of-the-art DRL-based methods that fail to consider social relationships or only consider single-hop social relationships.展开更多
To solve the path following control problem for unmanned surface vehicles(USVs),a control method based on deep reinforcement learning(DRL)with long short-term memory(LSTM)networks is proposed.A distributed proximal po...To solve the path following control problem for unmanned surface vehicles(USVs),a control method based on deep reinforcement learning(DRL)with long short-term memory(LSTM)networks is proposed.A distributed proximal policy opti-mization(DPPO)algorithm,which is a modified actor-critic-based type of reinforcement learning algorithm,is adapted to improve the controller performance in repeated trials.The LSTM network structure is introduced to solve the strong temporal cor-relation USV control problem.In addition,a specially designed path dataset,including straight and curved paths,is established to simulate various sailing scenarios so that the reinforcement learning controller can obtain as much handling experience as possible.Extensive numerical simulation results demonstrate that the proposed method has better control performance under missions involving complex maneuvers than trained with limited scenarios and can potentially be applied in practice.展开更多
The high penetration and uncertainty of distributed energies force the upgrade of volt-var control(VVC) to smooth the voltage and var fluctuations faster. Traditional mathematical or heuristic algorithms are increasin...The high penetration and uncertainty of distributed energies force the upgrade of volt-var control(VVC) to smooth the voltage and var fluctuations faster. Traditional mathematical or heuristic algorithms are increasingly incompetent for this task because of the slow online calculation speed. Deep reinforcement learning(DRL) has recently been recognized as an effective alternative as it transfers the computational pressure to the off-line training and the online calculation timescale reaches milliseconds. However, its slow offline training speed still limits its application to VVC. To overcome this issue, this paper proposes a simplified DRL method that simplifies and improves the training operations in DRL, avoiding invalid explorations and slow reward calculation speed. Given the problem that the DRL network parameters of original topology are not applicable to the other new topologies, side-tuning transfer learning(TL) is introduced to reduce the number of parameters needed to be updated in the TL process. Test results based on IEEE 30-bus and 118-bus systems prove the correctness and rapidity of the proposed method, as well as their strong applicability for large-scale control variables.展开更多
The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the mai...The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%.展开更多
Dynamic path planning is crucial for mobile robots to navigate successfully in unstructured envi-ronments.To achieve globally optimal path and real-time dynamic obstacle avoidance during the movement,a dynamic path pl...Dynamic path planning is crucial for mobile robots to navigate successfully in unstructured envi-ronments.To achieve globally optimal path and real-time dynamic obstacle avoidance during the movement,a dynamic path planning algorithm incorporating improved IB-RRT∗and deep reinforce-ment learning(DRL)is proposed.Firstly,an improved IB-RRT∗algorithm is proposed for global path planning by combining double elliptic subset sampling and probabilistic central circle target bi-as.Then,to tackle the slow response to dynamic obstacles and inadequate obstacle avoidance of tra-ditional local path planning algorithms,deep reinforcement learning is utilized to predict the move-ment trend of dynamic obstacles,leading to a dynamic fusion path planning.Finally,the simulation and experiment results demonstrate that the proposed improved IB-RRT∗algorithm has higher con-vergence speed and search efficiency compared with traditional Bi-RRT∗,Informed-RRT∗,and IB-RRT∗algorithms.Furthermore,the proposed fusion algorithm can effectively perform real-time obsta-cle avoidance and navigation tasks for mobile robots in unstructured environments.展开更多
Integrating the blockchain technology into mobile-edge computing(MEC)networks with multiple cooperative MEC servers(MECS)providing a promising solution to improving resource utilization,and helping establish a secure ...Integrating the blockchain technology into mobile-edge computing(MEC)networks with multiple cooperative MEC servers(MECS)providing a promising solution to improving resource utilization,and helping establish a secure reward mechanism that can facilitate load balancing among MECS.In addition,intelligent management of service caching and load balancing can improve the network utility in MEC blockchain networks with multiple types of workloads.In this paper,we investigate a learningbased joint service caching and load balancing policy for optimizing the communication and computation resources allocation,so as to improve the resource utilization of MEC blockchain networks.We formulate the problem as a challenging long-term network revenue maximization Markov decision process(MDP)problem.To address the highly dynamic and high dimension of system states,we design a joint service caching and load balancing algorithm based on the double-dueling Deep Q network(DQN)approach.The simulation results validate the feasibility and superior performance of our proposed algorithm over several baseline schemes.展开更多
In this paper,an artificial neural network(ANN)trained through a deep reinforcement learning(DRL)agent is used to perform flow control.The target is to look for the wake stabilization mechanism in an active way.The fl...In this paper,an artificial neural network(ANN)trained through a deep reinforcement learning(DRL)agent is used to perform flow control.The target is to look for the wake stabilization mechanism in an active way.The flow past a 2-D cylinder with a Reynolds number 240 is addressed with and without a control strategy.The control strategy is based on using two small rotating cylinders which are located at two symmetrical positions back of the main cylinder.The rotating speed of the counter-rotating small cylinder pair is determined by the ANN and DRL approach.By performing the final test,the interaction of the counter-rotating small cylinder pair with the wake of the main cylinder is able to stabilize the periodic shedding of the main cylinder wake.This demonstrates that the way of establishing this control strategy is reliable and viable.In another way,the internal interaction mechanism in this control method can be explored by the ANN and DRL approach.展开更多
With the increased emphasis on data security in the Internet of Things(IoT), blockchain has received more and more attention.Due to the computing consuming characteristics of blockchain, mobile edge computing(MEC) is ...With the increased emphasis on data security in the Internet of Things(IoT), blockchain has received more and more attention.Due to the computing consuming characteristics of blockchain, mobile edge computing(MEC) is integrated into IoT.However, how to efficiently use edge computing resources to process the computing tasks of blockchain from IoT devices has not been fully studied.In this paper, the MEC and blockchain-enhanced IoT is considered.The transactions recording the data or other application information are generated by the IoT devices, and they are offloaded to the MEC servers to join the blockchain.The practical Byzantine fault tolerance(PBFT) consensus mechanism is used among all the MEC servers which are also the blockchain nodes, and the latency of the consensus process is modeled with the consideration of characteristics of the wireless network.The joint optimization problem of serving base station(BS) selection and wireless transmission resources allocation is modeled as a Markov decision process(MDP), and the long-term system utility is defined based on task reward, credit value, the latency of infrastructure layer and blockchain layer, and computing cost.A double deep Q learning(DQN) based transactions offloading algorithm(DDQN-TOA) is proposed, and simulation results show the advantages of the proposed algorithm in comparison to other methods.展开更多
In this paper,we investigate a reconfigurable intelligent surface(RIS)assisted downlink orthogonal frequency division multiplexing(OFDM)transmission system.Taking into account hardware constraint,the RIS is considered...In this paper,we investigate a reconfigurable intelligent surface(RIS)assisted downlink orthogonal frequency division multiplexing(OFDM)transmission system.Taking into account hardware constraint,the RIS is considered to be organized into several blocks,and each block of RIS share the same phase shift,which has only 1-bit resolution.With multiple antennas at the base station(BS)serving multiple single-antenna users,we try to design the BS precoder and the RIS reflection phase shifts to maximize the minimum user spectral efficiency,so as to ensure fairness.A deep reinforcement learning(DRL)based algorithm is proposed,in which maximum ratio transmission(MRT)precoding is utilized at the BS and the dueling deep Q-network(DQN)framework is utilized for RIS phase shift optimization.Simulation results demonstrate that the proposed DRL-based algorithm can achieve almost optimal performance,while has much less computation consumption.展开更多
文摘Mobile Edge Computing(MEC)is one of the most promising techniques for next-generation wireless communication systems.In this paper,we study the problem of dynamic caching,computation offloading,and resource allocation in cache-assisted multi-user MEC systems with stochastic task arrivals.There are multiple computationally intensive tasks in the system,and each Mobile User(MU)needs to execute a task either locally or remotely in one or more MEC servers by offloading the task data.Popular tasks can be cached in MEC servers to avoid duplicates in offloading.The cached contents can be either obtained through user offloading,fetched from a remote cloud,or fetched from another MEC server.The objective is to minimize the long-term average of a cost function,which is defined as a weighted sum of energy consumption,delay,and cache contents’fetching costs.The weighting coefficients associated with the different metrics in the objective function can be adjusted to balance the tradeoff among them.The optimum design is performed with respect to four decision parameters:whether to cache a given task,whether to offload a given uncached task,how much transmission power should be used during offloading,and how much MEC resources to be allocated for executing a task.We propose to solve the problems by developing a dynamic scheduling policy based on Deep Reinforcement Learning(DRL)with the Deep Deterministic Policy Gradient(DDPG)method.A new decentralized DDPG algorithm is developed to obtain the optimum designs for multi-cell MEC systems by leveraging on the cooperations among neighboring MEC servers.Simulation results demonstrate that the proposed algorithm outperforms other existing strategies,such as Deep Q-Network(DQN).
基金co-supported by the National Natural Science Foundation of China(Nos.62003267,61573285)the Aeronautical Science Foundation of China(ASFC)(No.20175553027)Natural Science Basic Research Plan in Shaanxi Province of China(No.2020JQ-220)。
文摘Unmanned Aerial Vehicles(UAVs)play a vital role in military warfare.In a variety of battlefield mission scenarios,UAVs are required to safely fly to designated locations without human intervention.Therefore,finding a suitable method to solve the UAV Autonomous Motion Planning(AMP)problem can improve the success rate of UAV missions to a certain extent.In recent years,many studies have used Deep Reinforcement Learning(DRL)methods to address the AMP problem and have achieved good results.From the perspective of sampling,this paper designs a sampling method with double-screening,combines it with the Deep Deterministic Policy Gradient(DDPG)algorithm,and proposes the Relevant Experience Learning-DDPG(REL-DDPG)algorithm.The REL-DDPG algorithm uses a Prioritized Experience Replay(PER)mechanism to break the correlation of continuous experiences in the experience pool,finds the experiences most similar to the current state to learn according to the theory in human education,and expands the influence of the learning process on action selection at the current state.All experiments are applied in a complex unknown simulation environment constructed based on the parameters of a real UAV.The training experiments show that REL-DDPG improves the convergence speed and the convergence result compared to the state-of-the-art DDPG algorithm,while the testing experiments show the applicability of the algorithm and investigate the performance under different parameter conditions.
基金supported by National Key Research and Development Program of China(No.2016YFB0900105)
文摘A novel distributed reinforcement learning(DRL)strategy is proposed in this study to coordinate current sharing and voltage restoration in an islanded DC microgrid.Firstly, a reward function considering both equal proportional current sharing and cooperative voltage restoration is defined for each local agent. The global reward of the whole DC microgrid which is the sum of the local rewards is regarged as the optimization objective for DRL. Secondly,by using the distributed consensus method, the predefined pinning consensus value that will maximize the global reward is obtained. An adaptive updating method is proposed to ensure stability of the above pinning consensus method under uncertain communication. Finally, the proposed DRL is implemented along with the synchronization seeking process of the pinning reward, to maximize the global reward and achieve an optimal solution for a DC microgrid. Simulation studies with a typical DC microgrid demonstrate that the proposed DRL is computationally efficient and able toprovide an optimal solution even when the communication topology changes.
基金This work was support by the Research Grants Council of Hong Kong under General Research Fund(Grant Nos.15249316,15214418)the Departmental General Research Fund(Grant No.G-YBXQ).
文摘Nowadays the rapidly developing artificial intelligence has become a key solution for problems of diverse disciplines,especially those involving big data.Successes in these areas also attract researchers from the community of fluid mechanics,especially in the field of active flow control(AFC).This article surveys recent successful applications of machine learning in AFC,highlights general ideas,and aims at offering a basic outline for those who are interested in this specific topic.In this short review,we focus on two methodologies,i.e.,genetic programming(GP)and deep reinforcement learning(DRL),both having been proven effective,efficient,and robust in certain AFC problems,and outline some future prospects that might shed some light for relevant studies.
基金supported by the National Natural Science Foundation of China for Innovative Research Groups (61521003)the National Natural Science Foundation of China (61872382)+1 种基金the National Key Research and Development Program of China (2017YFB0803204)the Research and Development Program in Key Areas of Guangdong Province (No.2018B010113001)
文摘Software-Defined Networking(SDN)adapts logically-centralized control by decoupling control plane from data plane and provides the efficient use of network resources.However,due to the limitation of traditional routing strategies relying on manual configuration,SDN may suffer from link congestion and inefficient bandwidth allocation among flows,which could degrade network performance significantly.In this paper,we propose EARS,an intelligence-driven experiential network architecture for automatic routing.EARS adapts deep reinforcement learning(DRL)to simulate the human methods of learning experiential knowledge,employs the closed-loop network control mechanism incorporating with network monitoring technologies to realize the interaction with network environment.The proposed EARS can learn to make better control decision from its own experience by interacting with network environment and optimize the network intelligently by adjusting services and resources offered based on network requirements and environmental conditions.Under the network architecture,we design the network utility function with throughput and delay awareness,differentiate flows based on their size characteristics,and design a DDPGbased automatic routing algorithm as DRL decision brain to find the near-optimal paths for mice and elephant flows.To validate the network architecture,we implement it on a real network environment.Extensive simulation results show that EARS significantly improve the network throughput and reduces the average packet delay in comparison with baseline schemes(e.g.OSPF,ECMP).
基金This work was supported by the National Numerical Wind Tunnel Project(Grant No.NNW2019ZT4-B09)the National Natural Science Foundation of China(Grant Nos.91852106,91841303).
文摘In recent years,artificial neural networks(ANNs)and deep learning have become increasingly popular across a wide range of scientific and technical fields,including fluid mechanics.While it will take time to fully grasp the potentialities as well as the limitations of these methods,evidence is starting to accumulate that point to their potential in helping solve problems for which no theoretically optimal solution method is known.This is particularly true in fluid mechanics,where problems involving optimal control and optimal design are involved.Indeed,such problems are famously difficult to solve effectively with traditional methods due to the combination of non linearity,non convexity,and high dimensionality they involve.By contrast,deep reinforcement learning(DRL),a method of optimization based on teaching empirical strategies to an ANN through trial and error,is well adapted to solving such problems.In this short review,we offer an insight into the current state of the art of the use of DRL within fluid mechanics,focusing on control and optimal design problems.
文摘This study proposes a deep reinforcement learning(DRL)based approach to analyze the optimal power flow(OPF)of distribution networks(DNs)embedded with renewable energy and storage devices.First,the OPF of the DN is formulated as a stochastic nonlinear programming problem.Then,the multi-period nonlinear programming decision problem is formulated as a Markov decision process(MDP),which is composed of multiple single-time-step sub-problems.Subsequently,the state-of-the-art DRL algorithm,i.e.,proximal policy optimization(PPO),is used to solve the MDP sequentially considering the impact on the future.Neural networks are used to extract operation knowledge from historical data offline and provide online decisions according to the real-time state of the DN.The proposed approach fully exploits the historical data and reduces the influence of the prediction error on the optimization results.The proposed real-time control strategy can provide more flexible decisions and achieve better performance than the pre-determined ones.Comparative results demonstrate the effectiveness of the proposed approach.
基金National Natural Science Foundation of China(Nos.61803260,61673262 and 61175028)。
文摘Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting in low rewards and an inability for agents to learn strategies.This paper proposes a deep reinforcement learning(DRL)training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before.The ensemble reward function combines the advantages of two reward functions,which enhances the training effect of agents in long episode.Then,we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation.Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’policy scores of the task.These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems,leading to an improved model training performance.
基金supported by State Grid Science and Technology Program“Research on Real-time Autonomous Control Strategies for Power Grid Based on AI Technologies”(No.5700-201958523A-0-0-00)
文摘With the increasing penetration of renewable energy,power grid operators are observing both fast and large fluctuations in power and voltage profiles on a daily basis.Fast and accurate control actions derived in real time are vital to ensure system security and economics.To this end,solving alternating current(AC)optimal power flow(OPF)with operational constraints remains an important yet challenging optimization problem for secure and economic operation of the power grid.This paper adopts a novel method to derive fast OPF solutions using state-of-the-art deep reinforcement learning(DRL)algorithm,which can greatly assist power grid operators in making rapid and effective decisions.The presented method adopts imitation learning to generate initial weights for the neural network(NN),and a proximal policy optimization algorithm to train and test stable and robust artificial intelligence(AI)agents.Training and testing procedures are conducted on the IEEE 14-bus and the Illinois 200-bus systems.The results show the effectiveness of the method with significant potential for assisting power grid operators in real-time operations.
基金the National Natural Science Foundation of China(62076225,62073300)the Natural Science Foundation for Distinguished Young Scholars of Hubei(2019CFA081)。
文摘Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been developed with the use of different algorithmic strategies,evolutionary operators,and constraint-handling techniques.The performance of CMOEAs may be heavily dependent on the operators used,however,it is usually difficult to select suitable operators for the problem at hand.Hence,improving operator selection is promising and necessary for CMOEAs.This work proposes an online operator selection framework assisted by Deep Reinforcement Learning.The dynamics of the population,including convergence,diversity,and feasibility,are regarded as the state;the candidate operators are considered as actions;and the improvement of the population state is treated as the reward.By using a Q-network to learn a policy to estimate the Q-values of all actions,the proposed approach can adaptively select an operator that maximizes the improvement of the population according to the current state and thereby improve the algorithmic performance.The framework is embedded into four popular CMOEAs and assessed on 42 benchmark problems.The experimental results reveal that the proposed Deep Reinforcement Learning-assisted operator selection significantly improves the performance of these CMOEAs and the resulting algorithm obtains better versatility compared to nine state-of-the-art CMOEAs.
基金supported in part by the National Natural Science Foundation of China (No.61102124)。
文摘The optimal dispatch methods of integrated energy systems(IESs) currently struggle to address the uncertainties resulting from renewable energy generation and energy demand. Moreover, the increasing intensity of the greenhouse effect renders the reduction of IES carbon emissions a priority. To address these issues, a deep reinforcement learning(DRL)-based method is proposed to optimize the low-carbon economic dispatch model of an electricity-heat-gas IES. In the DRL framework, the optimal dispatch model of the IES is formulated as a Markov decision process(MDP). A reward function based on the reward-penalty ladder-type carbon trading mechanism(RPLT-CTM) is introduced to enable the DRL agents to learn more effective dispatch strategies. Moreover, a distributed proximal policy optimization(DPPO) algorithm, which is a novel policy-based DRL algorithm, is employed to train the DRL agents. The multithreaded architecture enhances the exploration ability of the DRL agents in complex environments. Experimental results illustrate that the proposed DPPO-based IES dispatch method can mitigate carbon emissions and reduce the total economic cost. The RPLT-CTM-based reward function outperforms the CTM-based methods, providing a 4.42% and 6.41% decrease in operating cost and carbon emission, respectively. Furthermore, the superiority and computational efficiency of DPPO compared with other DRL-based methods are demonstrated by a decrease of more than 1.53% and 3.23% in the operating cost and carbon emissions of the IES, respectively.
文摘Interactive Recommendation(IR)formulates the recommendation as a multi-step decision-making process which can actively utilize the individuals’feedback in multiple steps and optimize the long-term user benefit of recommendation.Deep Reinforcement Learning(DRL)has witnessed great application in IR for ecommerce.However,user cold-start problem impairs the learning process of the DRL-based recommendation scheme.Moreover,most existing DRL-based recommendations ignore user relationships or only consider the single-hop social relationships,which cannot fully utilize the social network.The fact that those schemes can not capture the multiple-hop social relationships among users in IR will result in a sub-optimal recommendation.To address the above issues,this paper proposes a Social Graph Neural network-based interactive Recommendation scheme(SGNR),which is a multiple-hop social relationships enhanced DRL framework.Within this framework,the multiple-hop social relationships among users are extracted from the social network via the graph neural network which can sufficiently take advantage of the social network to provide more personalized recommendations and effectively alleviate the user cold-start problem.The experimental results on two real-world datasets demonstrate that the proposed SGNR outperforms other state-of-the-art DRL-based methods that fail to consider social relationships or only consider single-hop social relationships.
基金supported by the National Natural Science Foundation(61601491)the Natural Science Foundation of Hubei Province(2018CFC865)the China Postdoctoral Science Foundation Funded Project(2016T45686).
文摘To solve the path following control problem for unmanned surface vehicles(USVs),a control method based on deep reinforcement learning(DRL)with long short-term memory(LSTM)networks is proposed.A distributed proximal policy opti-mization(DPPO)algorithm,which is a modified actor-critic-based type of reinforcement learning algorithm,is adapted to improve the controller performance in repeated trials.The LSTM network structure is introduced to solve the strong temporal cor-relation USV control problem.In addition,a specially designed path dataset,including straight and curved paths,is established to simulate various sailing scenarios so that the reinforcement learning controller can obtain as much handling experience as possible.Extensive numerical simulation results demonstrate that the proposed method has better control performance under missions involving complex maneuvers than trained with limited scenarios and can potentially be applied in practice.
文摘The high penetration and uncertainty of distributed energies force the upgrade of volt-var control(VVC) to smooth the voltage and var fluctuations faster. Traditional mathematical or heuristic algorithms are increasingly incompetent for this task because of the slow online calculation speed. Deep reinforcement learning(DRL) has recently been recognized as an effective alternative as it transfers the computational pressure to the off-line training and the online calculation timescale reaches milliseconds. However, its slow offline training speed still limits its application to VVC. To overcome this issue, this paper proposes a simplified DRL method that simplifies and improves the training operations in DRL, avoiding invalid explorations and slow reward calculation speed. Given the problem that the DRL network parameters of original topology are not applicable to the other new topologies, side-tuning transfer learning(TL) is introduced to reduce the number of parameters needed to be updated in the TL process. Test results based on IEEE 30-bus and 118-bus systems prove the correctness and rapidity of the proposed method, as well as their strong applicability for large-scale control variables.
基金supported by the Aeronautical Science Foundation(2017ZC53033).
文摘The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%.
基金the National Natural Science Foundation of China(No.61973275)。
文摘Dynamic path planning is crucial for mobile robots to navigate successfully in unstructured envi-ronments.To achieve globally optimal path and real-time dynamic obstacle avoidance during the movement,a dynamic path planning algorithm incorporating improved IB-RRT∗and deep reinforce-ment learning(DRL)is proposed.Firstly,an improved IB-RRT∗algorithm is proposed for global path planning by combining double elliptic subset sampling and probabilistic central circle target bi-as.Then,to tackle the slow response to dynamic obstacles and inadequate obstacle avoidance of tra-ditional local path planning algorithms,deep reinforcement learning is utilized to predict the move-ment trend of dynamic obstacles,leading to a dynamic fusion path planning.Finally,the simulation and experiment results demonstrate that the proposed improved IB-RRT∗algorithm has higher con-vergence speed and search efficiency compared with traditional Bi-RRT∗,Informed-RRT∗,and IB-RRT∗algorithms.Furthermore,the proposed fusion algorithm can effectively perform real-time obsta-cle avoidance and navigation tasks for mobile robots in unstructured environments.
基金supported in part by the National Natural Science Foundation of China 62072096the Fundamental Research Funds for the Central Universities under Grant 2232020A-12+4 种基金the International S&T Cooperation Program of Shanghai Science and Technology Commission under Grant 20220713000the Young Top-notch Talent Program in Shanghaithe"Shuguang Program"of Shanghai Education Development Foundation and Shanghai Municipal Education Commissionthe Fundamental Research Funds for the Central Universities and Graduate Student Innovation Fund of Donghua University CUSF-DH-D-2019093supported in part by the NSF under grants CNS-2107190 and ECCS-1923717。
文摘Integrating the blockchain technology into mobile-edge computing(MEC)networks with multiple cooperative MEC servers(MECS)providing a promising solution to improving resource utilization,and helping establish a secure reward mechanism that can facilitate load balancing among MECS.In addition,intelligent management of service caching and load balancing can improve the network utility in MEC blockchain networks with multiple types of workloads.In this paper,we investigate a learningbased joint service caching and load balancing policy for optimizing the communication and computation resources allocation,so as to improve the resource utilization of MEC blockchain networks.We formulate the problem as a challenging long-term network revenue maximization Markov decision process(MDP)problem.To address the highly dynamic and high dimension of system states,we design a joint service caching and load balancing algorithm based on the double-dueling Deep Q network(DQN)approach.The simulation results validate the feasibility and superior performance of our proposed algorithm over several baseline schemes.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.91852117,91852106)。
文摘In this paper,an artificial neural network(ANN)trained through a deep reinforcement learning(DRL)agent is used to perform flow control.The target is to look for the wake stabilization mechanism in an active way.The flow past a 2-D cylinder with a Reynolds number 240 is addressed with and without a control strategy.The control strategy is based on using two small rotating cylinders which are located at two symmetrical positions back of the main cylinder.The rotating speed of the counter-rotating small cylinder pair is determined by the ANN and DRL approach.By performing the final test,the interaction of the counter-rotating small cylinder pair with the wake of the main cylinder is able to stabilize the periodic shedding of the main cylinder wake.This demonstrates that the way of establishing this control strategy is reliable and viable.In another way,the internal interaction mechanism in this control method can be explored by the ANN and DRL approach.
基金Supported by the National Key Research and Development Program of China(No.2020YFC1807903)the Natural Science Foundation of Beijing Municipality(No.L192002)。
文摘With the increased emphasis on data security in the Internet of Things(IoT), blockchain has received more and more attention.Due to the computing consuming characteristics of blockchain, mobile edge computing(MEC) is integrated into IoT.However, how to efficiently use edge computing resources to process the computing tasks of blockchain from IoT devices has not been fully studied.In this paper, the MEC and blockchain-enhanced IoT is considered.The transactions recording the data or other application information are generated by the IoT devices, and they are offloaded to the MEC servers to join the blockchain.The practical Byzantine fault tolerance(PBFT) consensus mechanism is used among all the MEC servers which are also the blockchain nodes, and the latency of the consensus process is modeled with the consideration of characteristics of the wireless network.The joint optimization problem of serving base station(BS) selection and wireless transmission resources allocation is modeled as a Markov decision process(MDP), and the long-term system utility is defined based on task reward, credit value, the latency of infrastructure layer and blockchain layer, and computing cost.A double deep Q learning(DQN) based transactions offloading algorithm(DDQN-TOA) is proposed, and simulation results show the advantages of the proposed algorithm in comparison to other methods.
基金supported in part by the National Natural Science Foundation of China(62231009,61971126,62261160576,and 61921004)in part by the Natural Science Foundation of Jiangsu Province(BK20211511)in part by the Jiangsu Province Frontier Leading Technology Basic Research Project(BK20212002).
文摘In this paper,we investigate a reconfigurable intelligent surface(RIS)assisted downlink orthogonal frequency division multiplexing(OFDM)transmission system.Taking into account hardware constraint,the RIS is considered to be organized into several blocks,and each block of RIS share the same phase shift,which has only 1-bit resolution.With multiple antennas at the base station(BS)serving multiple single-antenna users,we try to design the BS precoder and the RIS reflection phase shifts to maximize the minimum user spectral efficiency,so as to ensure fairness.A deep reinforcement learning(DRL)based algorithm is proposed,in which maximum ratio transmission(MRT)precoding is utilized at the BS and the dueling deep Q-network(DQN)framework is utilized for RIS phase shift optimization.Simulation results demonstrate that the proposed DRL-based algorithm can achieve almost optimal performance,while has much less computation consumption.