As one of the major contributions of biology to competitive decision making, evolutionary game theory provides a useful tool for studying the evolution of cooperation. To achieve the optimal solution for unmanned aeri...As one of the major contributions of biology to competitive decision making, evolutionary game theory provides a useful tool for studying the evolution of cooperation. To achieve the optimal solution for unmanned aerial vehicles (UAVs) that are car- rying out a sensing task, this paper presents a Markov decision evolutionary game (MDEG) based learning algorithm. Each in- dividual in the algorithm follows a Markov decision strategy to maximize its payoff against the well known Tit-for-Tat strate- gy. Simulation results demonstrate that the MDEG theory based approach effectively improves the collective payoff of the roam. The proposed algorithm can not only obtain the best action sequence but also a sub-optimal Markov policy that is inde- pendent of the game duration. Furthermore, the paper also studies the emergence of cooperation in the evolution of self-regarded UAVs. The results show that it is the adaptive ability of the MDEG based approach as well as the perfect balance between revenge and forgiveness of the Tit-for-Tat strategy that the emergence of cooperation should be attributed to.展开更多
In this paper,a day-ahead electricity market bidding problem with multiple strategic generation company(GEN-CO)bidders is studied.The problem is formulated as a Markov game model,where GENCO bidders interact with each...In this paper,a day-ahead electricity market bidding problem with multiple strategic generation company(GEN-CO)bidders is studied.The problem is formulated as a Markov game model,where GENCO bidders interact with each other to develop their optimal day-ahead bidding strategies.Considering unobservable information in the problem,a model-free and data-driven approach,known as multi-agent deep deterministic policy gradient(MADDPG),is applied for approximating the Nash equilibrium(NE)in the above Markov game.The MAD-DPG algorithm has the advantage of generalization due to the automatic feature extraction ability of the deep neural networks.The algorithm is tested on an IEEE 30-bus system with three competitive GENCO bidders in both an uncongested case and a congested case.Comparisons with a truthful bidding strategy and state-of-the-art deep reinforcement learning methods including deep Q network and deep deterministic policy gradient(DDPG)demonstrate that the applied MADDPG algorithm can find a superior bidding strategy for all the market participants with increased profit gains.In addition,the comparison with a conventional-model-based method shows that the MADDPG algorithm has higher computational efficiency,which is feasible for real-world applications.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.61425008,61333004 and 61273054)Top-Notch Young Talents Program of China,and Aeronautical Foundation of China(Grant No.20135851042)
文摘As one of the major contributions of biology to competitive decision making, evolutionary game theory provides a useful tool for studying the evolution of cooperation. To achieve the optimal solution for unmanned aerial vehicles (UAVs) that are car- rying out a sensing task, this paper presents a Markov decision evolutionary game (MDEG) based learning algorithm. Each in- dividual in the algorithm follows a Markov decision strategy to maximize its payoff against the well known Tit-for-Tat strate- gy. Simulation results demonstrate that the MDEG theory based approach effectively improves the collective payoff of the roam. The proposed algorithm can not only obtain the best action sequence but also a sub-optimal Markov policy that is inde- pendent of the game duration. Furthermore, the paper also studies the emergence of cooperation in the evolution of self-regarded UAVs. The results show that it is the adaptive ability of the MDEG based approach as well as the perfect balance between revenge and forgiveness of the Tit-for-Tat strategy that the emergence of cooperation should be attributed to.
基金This work was supported in part by the US Department of Energy(DOE),Office of Electricity and Office of Energy Efficiency and Renewable Energy under contract DE-AC05-00OR22725in part by CURENT,an Engineering Research Center funded by US National Science Foundation(NSF)and DOE under NSF award EEC-1041877in part by NSF award ECCS-1809458.
文摘In this paper,a day-ahead electricity market bidding problem with multiple strategic generation company(GEN-CO)bidders is studied.The problem is formulated as a Markov game model,where GENCO bidders interact with each other to develop their optimal day-ahead bidding strategies.Considering unobservable information in the problem,a model-free and data-driven approach,known as multi-agent deep deterministic policy gradient(MADDPG),is applied for approximating the Nash equilibrium(NE)in the above Markov game.The MAD-DPG algorithm has the advantage of generalization due to the automatic feature extraction ability of the deep neural networks.The algorithm is tested on an IEEE 30-bus system with three competitive GENCO bidders in both an uncongested case and a congested case.Comparisons with a truthful bidding strategy and state-of-the-art deep reinforcement learning methods including deep Q network and deep deterministic policy gradient(DDPG)demonstrate that the applied MADDPG algorithm can find a superior bidding strategy for all the market participants with increased profit gains.In addition,the comparison with a conventional-model-based method shows that the MADDPG algorithm has higher computational efficiency,which is feasible for real-world applications.