摘要
在无人机(UAV)编队跟踪任务中,虚假数据注入(FDI)攻击者可向控制指令注入误导性数据,导致无人机无法形成指定的编队构型,故需设计安全编队跟踪控制器。为此,本文利用零和图博弈对攻防过程进行建模,其中FDI攻击者和安全控制器是博弈的参与者,攻击者的目标是最大化设定的成本函数,而安全控制器的目标与之相反,求解博弈并获得最优安全控制策略依赖于求取Hamilton-Jacobi-Isaacs(HJI)方程的解。而HJI方程是耦合偏微分方程,难以直接求解,因此结合经验回放机制引入了有限时间收敛的在线强化学习算法,设计了单评价神经网络近似值函数并获得了最优安全控制策略。最终利用仿真验证了算法的有效性。
In Unmanned Aerial Vehicle(UAV)formation tracking missions,False Data Injection(FDI)attackers can inject misleading data into the control commands,resulting in the fact that UAVs can not form the specified formation configuration,so there is a need to design a secure formation tracking controller.The attack-defense process was modeled as a zero-sum graphical game,in which the FDI attacker and the secure controller were viewed as game players.The attacker aims to maximize the cost function yet the secure controller serves a contrary purpose.Solving the game and acquiring the optimal secure control policy rely on solving the Hamilton-Jacobi-Isaacs(HJI)equation.The HJI equation is a coupled partial differential equation,which is difficult to solve directly.Therefore,the finite-time convergent online reinforcement learning algorithm that combines the experience replay mechanism was introduced and the critic-only neural network was utilized to approximate the value function for obtaining the optimal secure control policy.A numerical simulation was given to show the effectiveness of the raised scheme.
作者
弓镇宇
杨飞生
Gong Zhenyu;Yang Feisheng(Northwestern Polytechnical University,Xi’an 710072,China)
出处
《航空科学技术》
2024年第4期25-30,共6页
Aeronautical Science & Technology
基金
国家自然科学基金(62073269)
航空科学基金(2020Z034053002)
陕西省重点研发计划项目(2022GY-244)
重庆市自然科学基金(CSTB2022NSCQ-MSX0963)
广东省基础与应用基础研究基金(2023A1515011220)。
关键词
FDI攻击
多无人机
在线强化学习
优化控制
零和图博弈
FDI attack
multi-UAVs
online reinforcement learning
optimal control
zero-sum graphical game