摘要
针对高速飞行器与拦截器的攻防博弈问题,研究了一种基于双深度Q网络(DDQN)的改进算法。该算法针对经典DDQN样本利用效率低的问题,设置多个经验池,并将一轮对抗中Q值的累积时序差分误差(TD-error)与累积奖励值相结合,通过模糊推理计算样本存储至不同经验池中的概率。再根据累积奖励的时序差分误差设计积分抽样器,从不同经验池中抽取样本进行训练。模型的奖励函数设计原则为在成功突防的基础上减少自身机械能消耗。实验结果表明,相比于经典DDQN算法,改进算法能够有效提高样本利用效率,为解决高速飞行器机动突防问题提供了一种新思路。
Aiming at the attack-defense game between high speed aircraft and the interceptor, an improved DDQN is researched for high speed aircraft. The algorithm is aimed at the low utilization efficiency of sample in classical DDQN, by setting up multi-experience replay buffer,and combining accumulate Q-value temporal difference error(TD-error) with accumulate reward, the samples by fuzzy reasoning are classified and stored. Then, according to the training process, integral sampler and sampling form different experience replay buffer are designed The design principle of reward function is to reduce its mechanical energy consumption on the basis of successful penetration. The results show that the utilization efficiency of samples is improved by using this algorithm which provides a new idea to solve high speed aircraft maneuver penetration problem.
作者
何湘远
尘军
郭昊
余卓阳
田博
He Xiangyuan;Chen Jun;Guo Hao;Yu Zhuoyang;Tian Bo(Science and Technology on Space Physics Laboratory,Beijing 100076,China)
出处
《航天控制》
CSCD
北大核心
2022年第4期76-83,共8页
Aerospace Control
关键词
高速飞行器
拦截器
改进DDQN
模糊推理
攻防博弈
High speed aircraft
Interceptor
Improved DDQN
Fuzzy reasoning
Attack-defense game