摘要
针对无人机(UAV)空战环境信息复杂、对抗性强所导致的敌机机动策略难以预测,以及作战胜率不高的问题,设计了一种引导Minimax-DDQN(Minimax-Double Deep Q-Network)算法。首先,在Minimax决策方法的基础上提出了一种引导式策略探索机制;然后,结合引导Minimax策略,以提升Q网络更新效率为出发点设计了一种DDQN(Double Deep Q-Network)算法;最后,提出进阶式三阶段的网络训练方法,通过不同决策模型间的对抗训练,获取更为优化的决策模型。实验结果表明,相较于Minimax-DQN(Minimax-DQN)、Minimax-DDQN等算法,所提算法追击直线目标的成功率提升了14%~60%,并且与DDQN算法的对抗胜率不低于60%。可见,与DDQN、Minimax-DDQN等算法相比,所提算法在高对抗的作战环境中具有更强的决策能力,适应性更好。
A guided Minimax-DDQN(Minimax-Double Deep Q-Network) algorithm was designed to solve the problems of unpredictable enemy aircraft maneuver strategy and low winning rate,which are caused by the complex environment information and strong confrontation of Unmanned Aerial Vehicle(UAV) in air combat.Firstly,on the basis of Minimax decision-making method,a guided strategy exploration mechanism was proposed.Then,combined with the guided Minimax strategy,a type of DDQN(Double Deep Q-Network) algorithm was designed to improve the update efficiency of Q-network.Finally,an advanced three-stage network training method was proposed.And through confrontation training between different decision models,better optimized decision model was obtained.Experimental results show that compared with Minimax-DQN(Minimax-DQN),Minimax-DDQN and other algorithms,the proposed algorithm has the success rate of chasing straight target improved by 14% to 60% and the winning rate against DDQN algorithm over 60%.It can be seen that compared with algorithms such as DDQN and Minimax-DDQN,the proposed algorithm has stronger decision-making capability and better adaptability in high confrontation combat environment.
作者
王昱
任田君
范子琳
WANG Yu;REN Tianjun;FAN Zilin(School of Automation,Shenyang Aerospace University,Shenyang Liaoning 110136,China)
出处
《计算机应用》
CSCD
北大核心
2023年第8期2636-2643,共8页
journal of Computer Applications
基金
国家自然科学基金资助项目(61906125)
辽宁省教育厅科学研究经费资助项目(LJKZ0222)。
关键词
无人机空战
自主决策
深度强化学习
双重深度Q网络
多阶段训练
Unmanned Aerial Vehicle(UAV)air combat
autonomous decision-making
deep reinforcement learning
Double Deep Q-Network(DDQN)
multi-stage training