摘要
针对无人机在复杂障碍物环境下追踪机动目标的问题,提出了一种基于IMM-PPO的导航跟踪策略,估计多模型混合的机动目标状态信息,设计基于目标跟踪性能、追踪逼近时间以及障碍物约束的奖惩函数,并在Actor-Critic网络结构下设计近端策略优化的算法框架,通过智能体与环境交互,训练出奖励最大化下的网络参数。训练后的决策网络能够根据环境信息完成避障导航并实现对机动目标的稳定跟踪。仿真结果表明,相比于传统避障跟踪算法,基于IMM-PPO的导航跟踪策略具有更好的跟踪性能、更快的追踪速度以及更短的避障导航路径,且在初始条件改变的情况下仍具有一定的自主追踪能力,在应用于无人机机动目标追踪任务中时具备更大的优势。
Focusing on the problem of UAV maneuvering target tracking in complex obstacle environment, this paper proposes a navigation and tracking strategy based on IMM-PPO. The state information of maneuvering targets with multiple models is estimated, a reward and punishment function based on target tracking performance, tracking approaching time and obstacle constraints is designed. The algorithm framework of near end strategy optimization is designed under the Actor-Critical network structure, and the network parameters under the maximum reward are trained through the interaction between agents and the environment. The trained tracking strategy network can complete obstacle avoidance navigation and achieve stable tracking of maneuvering targets according to environmental information. The simulation results show that, compared with the traditional obstacle avoidance and tracking algorithm, the navigation and tracking strategy based on IMM-PPO has better tracking performance, faster tracking speed, and shorter obstacle avoidance navigation path. The algorithm also has a certain degree of autonomous tracking ability when the initial conditions change, and has greater advantages when applied to the UAV maneuvering target tracking task.
作者
成旭明
丛玉华
欧阳权
王志胜
CHENG Xuming;CONG Yuhua;OUYANG Quan;WANG Zhisheng(College of Automation Engineering,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China)
出处
《弹箭与制导学报》
北大核心
2022年第6期46-54,共9页
Journal of Projectiles,Rockets,Missiles and Guidance
关键词
强化学习
多旋翼无人机
目标跟踪
路径规划
reinforcement learning
multi-rotor UAV
target tracking
route planning