摘要
为了能让四旋翼无人机的姿态控制器具有强大的目标值追踪与抗外部干扰的能力,提出了一种基于参考模型的深度确定性策略梯度的四旋翼无人机姿态控制器设计。该方法通过神经网络,将四旋翼无人机的状态直接映射到输出。本文的强化学习算法是结合深度确定性策略(deep deterministic policy gradient,DDPG)和深度神经网络所设计的。在DDPG算法结构中,进一步加入参考模型,规避控制量太大造成的系统超调,增强了系统的稳定性以及鲁棒性。同时,修改了强化学习中奖励的构成,成功消除了系统的稳态误差。经过研究实验表明,该控制方法可以对目标值进行快速地追踪且有着较强的鲁棒性,可见该控制器相比于传统的控制器,提高了其目标值追踪能力以及抗干扰能力。
In order to investigate the ability of the attitude controller of a quadcopter unmanned aerial vehicle(UAV)to possess strong target value tracking and resistance to external disturbances,a design for a quadcopter UAV attitude controller based on a refe-rence model using deep deterministic policy gradients was proposed.The proposed method employed a neural network to directly map the state of the quadcopter unmanned aerial vehicle to its output.The reinforcement learning algorithm utilized in the paper was a combination of deep deterministic policy gradient(DDPG)and deep neural networks.In the structure of the DDPG algorithm,a reference model was further incorporated to mitigate system overshoot caused by excessive control inputs,stability and robustness of the system was enhanced.Moreover,modifications were made to the composition of rewards in reinforcement learning,The steady-state error of the system was successfully eliminated.The results show that this control method exhibits strong robustness in both target value tracking and resistance to external disturbances.It is concluded the controller performs better in terms of target value tracking and interference immunity compared to conventional controllers.
作者
王伟
吴昊
刘鸿勋
杨溢
WANG Wei;WU Hao;LIU Hong-xun;YANG Yi(School of Automation,Nanjing University of Information Science&Technology,Nanjing 210000,China)
出处
《科学技术与工程》
北大核心
2023年第34期14888-14895,共8页
Science Technology and Engineering
基金
江苏省科学技术厅基础研究计划(自然科学基金)(BK20210643)。
关键词
深度强化学习
姿态控制
神经网络
参考模型
deep reinforcement learning
attitude control
neural network
reference model