摘要
为提升无人机在复杂空战场景中的存活率,基于公开无人机空战博弈仿真平台,使用强化学习方法生成机动策略,以深度双Q网络(double deep Q-network, DDQN)和深度确定性策略梯度(deep deterministic policy gradient, DDPG)算法为基础,提出单元状态序列(unit state sequence, USS),并采用门控循环单元(gated recurrent unit, GRU)融合USS中的态势特征,增加复杂空战场景下的状态特征识别能力和算法收敛能力。实验结果表明,智能体在面对采用标准比例导引算法的导弹攻击时,取得了98%的规避导弹存活率,使无人机在多发导弹同时攻击的复杂场景中,也能够取得88%的存活率,对比传统的简单机动模式,无人机的存活率大幅提高。
In order to improve the survival rate of unmanned aerial vehicles(UAVs)in complex air combat scenarios,based on the open UAVs air intelligence game simulation platform,a reinforcement learning method is used to generate maneuver strategies.Based on the deep double Q network(DDQN)and deep deterministic policy gradient(DDPG)algorithms,an unit state sequence(USS)is proposed in this paper,and the gated recurrent unit(GRU)is used to fuse the situation features in USS,with the propose to increase the ability of state features recognition and algorithm convergence in complex air combat scenarios.The experimental results show that when faced with missile attacks using standard proportional guidance algorithm,the agent achieves a survival rate of 98%for missiles evading,and in complex scenarios where multiple missiles attack simultaneously,it can also achieve a survival rate of 88%.Compared with the traditional simple maneuvering modes,the survival rate of UAVs is significantly improved.
作者
吴冯国
陶伟
李辉
张建伟
郑成辰
WU Fengguo;TAO Wei;LI Hui;ZHANG Jianwei;ZHENG Chengchen(National Key Laboratory of Fundamental Science on Synthetic Vision,Sichuan University,Chengdu 610065,China;China Ship Development and Design Center,Wuhan 430064,China;School of Computer Science,Sichuan University,Chengdu 610065,China)
出处
《系统工程与电子技术》
EI
CSCD
北大核心
2023年第6期1702-1711,共10页
Systems Engineering and Electronics
基金
“十三五”全军共用信息系统装备预研项目(31505550302)资助课题。
关键词
深度强化学习
无人机
单元状态序列
门控循环单元
deep reinforcement learning(DRL)
unmanned aerial vehicles(UAVs)
unit state sequence(USS)
gated recurrent unit(GRU)