As one of the major contributions of biology to competitive decision making, evolutionary game theory provides a useful tool for studying the evolution of cooperation. To achieve the optimal solution for unmanned aeri...As one of the major contributions of biology to competitive decision making, evolutionary game theory provides a useful tool for studying the evolution of cooperation. To achieve the optimal solution for unmanned aerial vehicles (UAVs) that are car- rying out a sensing task, this paper presents a Markov decision evolutionary game (MDEG) based learning algorithm. Each in- dividual in the algorithm follows a Markov decision strategy to maximize its payoff against the well known Tit-for-Tat strate- gy. Simulation results demonstrate that the MDEG theory based approach effectively improves the collective payoff of the roam. The proposed algorithm can not only obtain the best action sequence but also a sub-optimal Markov policy that is inde- pendent of the game duration. Furthermore, the paper also studies the emergence of cooperation in the evolution of self-regarded UAVs. The results show that it is the adaptive ability of the MDEG based approach as well as the perfect balance between revenge and forgiveness of the Tit-for-Tat strategy that the emergence of cooperation should be attributed to.展开更多
Intelligent edge computing carries out edge devices of the Internet of things(Io T) for data collection, calculation and intelligent analysis, so as to proceed data analysis nearby and make feedback timely. Because of...Intelligent edge computing carries out edge devices of the Internet of things(Io T) for data collection, calculation and intelligent analysis, so as to proceed data analysis nearby and make feedback timely. Because of the mobility of mobile equipments(MEs), if MEs move among the reach of the small cell networks(SCNs), the offloaded tasks cannot be returned to MEs successfully. As a result, migration incurs additional costs. In this paper, joint task offloading and migration schemes in mobility-aware Mobile Edge Computing(MEC) network based on Reinforcement Learning(RL) are proposed to obtain the maximum system revenue. Firstly, the joint optimization problems of maximizing the total revenue of MEs are put forward, in view of the mobility-aware MEs. Secondly, considering time-varying computation tasks and resource conditions, the mixed integer non-linear programming(MINLP) problem is described as a Markov Decision Process(MDP). Then we propose a novel reinforcement learning-based optimization framework to work out the problem, instead traditional methods. Finally, it is shown that the proposed schemes can obviously raise the total revenue of MEs by giving simulation results.展开更多
In shield tunneling, the control system needs very reliable capability of deviation rectifying in order to ensure that the tunnel trajectory meets the permissible criterion. To this goal, we present an approach that a...In shield tunneling, the control system needs very reliable capability of deviation rectifying in order to ensure that the tunnel trajectory meets the permissible criterion. To this goal, we present an approach that adopts Markov decision process (MDP) theory to plan the driving force with explicit representation of the uncertainty during excavation. The shield attitudes of possi- ble world and driving forces during excavation are scattered as a state set and an action set, respectively. In particular, an evaluation function is proposed with consideration of the stability of driving force and the deviation of shield attitude. Unlike the deterministic approach, the driving forces based on MDP model lead to an uncertain effect and the attitude is known only with an imprecise probability. We consider the case that the transition probability varies in a given domain estimated by field data, and discuss the optimal policy based on the interval arithmetic. The validity of the approach is discussed by comparing the driving force planning with the actual operating data from the field records of Line 9 in Tianjin. It is proved that the MDP model is reasonable enough to predict the driving force for automatic deviation rectifying.展开更多
Driven by modern advanced information and communication technologies,distributed energy resources have great potential for energy supply within the framework of the virtual power plant(VPP).Meanwhile,demand response(D...Driven by modern advanced information and communication technologies,distributed energy resources have great potential for energy supply within the framework of the virtual power plant(VPP).Meanwhile,demand response(DR)is becoming increasingly important for enhancing the VPP operation and mitigating the risks associated with the fluctuation of renewable energy resources(RESs).In this paper,we propose an incentivebased DR program for the VPP to minimize the deviation penalty from participating in the power market.The Markov decision process(MDP)with unknown transition probability is constructed from the VPP’s prospective to formulate an incentivebased DR program,in which the randomness of consumer behavior and RES generation are taken into consideration.Furthermore,a value function of prospect theory(PT)is developed to characterize consumer’s risk attitude and describe the psychological factors.A model-free deep reinforcement learning(DRL)-based approach is proposed to deal with the randomness existing in the model and adaptively determine the optimal DR pricing strategy for the VPP,without requiring any system model information.Finally,the results of cases tested demonstrate the effectiveness of the proposed approach.展开更多
In this paper,we study the distributionally robust joint chance-constrained Markov decision process.Utilizing the logarithmic transformation technique,we derive its deterministic reformulation with bi-convex terms und...In this paper,we study the distributionally robust joint chance-constrained Markov decision process.Utilizing the logarithmic transformation technique,we derive its deterministic reformulation with bi-convex terms under the moment-based uncertainty set.To cope with the non-convexity and improve the robustness of the solution,we propose a dynamical neural network approach to solve the reformulated optimization problem.Numerical results on a machine replacement problem demonstrate the efficiency of the proposed dynamical neural network approach when compared with the sequential convex approximation approach.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.61425008,61333004 and 61273054)Top-Notch Young Talents Program of China,and Aeronautical Foundation of China(Grant No.20135851042)
文摘As one of the major contributions of biology to competitive decision making, evolutionary game theory provides a useful tool for studying the evolution of cooperation. To achieve the optimal solution for unmanned aerial vehicles (UAVs) that are car- rying out a sensing task, this paper presents a Markov decision evolutionary game (MDEG) based learning algorithm. Each in- dividual in the algorithm follows a Markov decision strategy to maximize its payoff against the well known Tit-for-Tat strate- gy. Simulation results demonstrate that the MDEG theory based approach effectively improves the collective payoff of the roam. The proposed algorithm can not only obtain the best action sequence but also a sub-optimal Markov policy that is inde- pendent of the game duration. Furthermore, the paper also studies the emergence of cooperation in the evolution of self-regarded UAVs. The results show that it is the adaptive ability of the MDEG based approach as well as the perfect balance between revenge and forgiveness of the Tit-for-Tat strategy that the emergence of cooperation should be attributed to.
基金supported in part by the National Natural Science Foundation of China under Grant 61701038。
文摘Intelligent edge computing carries out edge devices of the Internet of things(Io T) for data collection, calculation and intelligent analysis, so as to proceed data analysis nearby and make feedback timely. Because of the mobility of mobile equipments(MEs), if MEs move among the reach of the small cell networks(SCNs), the offloaded tasks cannot be returned to MEs successfully. As a result, migration incurs additional costs. In this paper, joint task offloading and migration schemes in mobility-aware Mobile Edge Computing(MEC) network based on Reinforcement Learning(RL) are proposed to obtain the maximum system revenue. Firstly, the joint optimization problems of maximizing the total revenue of MEs are put forward, in view of the mobility-aware MEs. Secondly, considering time-varying computation tasks and resource conditions, the mixed integer non-linear programming(MINLP) problem is described as a Markov Decision Process(MDP). Then we propose a novel reinforcement learning-based optimization framework to work out the problem, instead traditional methods. Finally, it is shown that the proposed schemes can obviously raise the total revenue of MEs by giving simulation results.
基金supported by the National Basic Research Program (973 Program) of China (Grant No. 2007CB714000)
文摘In shield tunneling, the control system needs very reliable capability of deviation rectifying in order to ensure that the tunnel trajectory meets the permissible criterion. To this goal, we present an approach that adopts Markov decision process (MDP) theory to plan the driving force with explicit representation of the uncertainty during excavation. The shield attitudes of possi- ble world and driving forces during excavation are scattered as a state set and an action set, respectively. In particular, an evaluation function is proposed with consideration of the stability of driving force and the deviation of shield attitude. Unlike the deterministic approach, the driving forces based on MDP model lead to an uncertain effect and the attitude is known only with an imprecise probability. We consider the case that the transition probability varies in a given domain estimated by field data, and discuss the optimal policy based on the interval arithmetic. The validity of the approach is discussed by comparing the driving force planning with the actual operating data from the field records of Line 9 in Tianjin. It is proved that the MDP model is reasonable enough to predict the driving force for automatic deviation rectifying.
基金supported by the National Natural Science Foundation of China (No.51777155).
文摘Driven by modern advanced information and communication technologies,distributed energy resources have great potential for energy supply within the framework of the virtual power plant(VPP).Meanwhile,demand response(DR)is becoming increasingly important for enhancing the VPP operation and mitigating the risks associated with the fluctuation of renewable energy resources(RESs).In this paper,we propose an incentivebased DR program for the VPP to minimize the deviation penalty from participating in the power market.The Markov decision process(MDP)with unknown transition probability is constructed from the VPP’s prospective to formulate an incentivebased DR program,in which the randomness of consumer behavior and RES generation are taken into consideration.Furthermore,a value function of prospect theory(PT)is developed to characterize consumer’s risk attitude and describe the psychological factors.A model-free deep reinforcement learning(DRL)-based approach is proposed to deal with the randomness existing in the model and adaptively determine the optimal DR pricing strategy for the VPP,without requiring any system model information.Finally,the results of cases tested demonstrate the effectiveness of the proposed approach.
基金supported by National Natural Science Foundation of China(Grant Nos.11991023 and 12371324)National Key R&D Program of China(Grant No.2022YFA1004000)。
文摘In this paper,we study the distributionally robust joint chance-constrained Markov decision process.Utilizing the logarithmic transformation technique,we derive its deterministic reformulation with bi-convex terms under the moment-based uncertainty set.To cope with the non-convexity and improve the robustness of the solution,we propose a dynamical neural network approach to solve the reformulated optimization problem.Numerical results on a machine replacement problem demonstrate the efficiency of the proposed dynamical neural network approach when compared with the sequential convex approximation approach.