This paper is concerned with the problems of mapless navigation for unmanned aerial vehicles in the scenarios with limited sensor accuracy and computing capability.A novel learning-based algorithm called soft actor-cr...This paper is concerned with the problems of mapless navigation for unmanned aerial vehicles in the scenarios with limited sensor accuracy and computing capability.A novel learning-based algorithm called soft actor-critic from demonstrations(SACfD)is proposed,integrating reinforcement learning with imitation learning.Specifically,the maximum entropy reinforcement learning framework is introduced to enhance the exploration capability of the algorithm,upon which the paper explores a way to sufficiently leverage demonstration data to significantly accelerate the convergence rate while improving policy performance reliably.Further,the proposed algorithm enables an implementation of mapless navigation for unmanned aerial vehicles and experimental results show that it outperforms the existing algorithms.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.12072088,62003117,and 62003118)the National Defense Basic Scientific Research Program of China(Grant No.JCKY2020603B010)the Natural Science Foundation of Heilongjiang Province,China(Grant No.ZD2020F001)。
文摘This paper is concerned with the problems of mapless navigation for unmanned aerial vehicles in the scenarios with limited sensor accuracy and computing capability.A novel learning-based algorithm called soft actor-critic from demonstrations(SACfD)is proposed,integrating reinforcement learning with imitation learning.Specifically,the maximum entropy reinforcement learning framework is introduced to enhance the exploration capability of the algorithm,upon which the paper explores a way to sufficiently leverage demonstration data to significantly accelerate the convergence rate while improving policy performance reliably.Further,the proposed algorithm enables an implementation of mapless navigation for unmanned aerial vehicles and experimental results show that it outperforms the existing algorithms.