Unmanned aerial vehicles(UAVs)have been extensively used in civil and industrial applications due to the rapid development of the guidance,navigation and control(GNC)technologies.Especially,using deep reinforcement le...Unmanned aerial vehicles(UAVs)have been extensively used in civil and industrial applications due to the rapid development of the guidance,navigation and control(GNC)technologies.Especially,using deep reinforcement learning methods for motion control acquires a major progress recently,since deep Q-learning algorithm has been successfully applied to the continuous action domain problem.This paper proposes an improved deep deterministic policy gradient(DDPG)algorithm for path following control problem of UAV.A speci-c reward function is designed for minimizing the cross-track error of the path following problem.In the training phase,a double experience replay bu®er(DERB)is used to increase the learning e±ciency and accelerate the convergence speed.First,the model of UAV path following problem has been established.After that,the framework of DDPG algorithm is constructed.Then the state space,action space and reward function of the UAV path following algorithm are designed.DERB is proposed to accelerate the training phase.Finally,simulation results are carried out to show the e®ectiveness of the proposed DERB–DDPG method.展开更多
基金This work is partially supported by the National Natural Science Foundation of China(Nos.61833013 and 62003162)the Natural Science Foundation of Jiangsu Province of China(No.BK20200416)the China Postdoctoral Science Foundation(Nos.2020TQ0151 and 2020M681590)and the Natural Sciences and Engineering Research Council of Canada.
文摘Unmanned aerial vehicles(UAVs)have been extensively used in civil and industrial applications due to the rapid development of the guidance,navigation and control(GNC)technologies.Especially,using deep reinforcement learning methods for motion control acquires a major progress recently,since deep Q-learning algorithm has been successfully applied to the continuous action domain problem.This paper proposes an improved deep deterministic policy gradient(DDPG)algorithm for path following control problem of UAV.A speci-c reward function is designed for minimizing the cross-track error of the path following problem.In the training phase,a double experience replay bu®er(DERB)is used to increase the learning e±ciency and accelerate the convergence speed.First,the model of UAV path following problem has been established.After that,the framework of DDPG algorithm is constructed.Then the state space,action space and reward function of the UAV path following algorithm are designed.DERB is proposed to accelerate the training phase.Finally,simulation results are carried out to show the e®ectiveness of the proposed DERB–DDPG method.