摘要
由于距离地球较远、测控延时误差较大、飞行环境十分复杂且难以提前预测,行星软着陆的自主制导技术目前面临着水平位置估计困难、导航参考信息匮乏、复杂地形着陆困难等挑战。针对行星软着陆存在的困难和挑战,提出了基于引导策略搜索算法的有模型强化学习制导方法,实现了着陆器在初始状态受到扰动时无需重新规划,仍能在满足约束条件的情况下降落在指定位置。该方法首先将迭代线性二次调节器作为控制器,产生初始轨迹;其次,利用多层神经网络拟合制导策略;最后,利用控制器监督策略学习,进而收敛,产生可行策略。针对行星表面软着陆的仿真验证结果显示,该算法仅通过几次循环,即可实现初始状态变化的快速软着陆。该方法一方面表明了基于有模型强化学习的数据的高效利用率,另一方面也证明了强化学习方法在深空探测领域中具有广阔的应用前景。
Due to the distance from the earth,the large delay error in measurement and control system,the complicated flight environment and the difficulty in predicting in advance,the autonomous guidance technology for planetary soft landing currently has challenges such as difficult horizontal position estimation,lack of navigation reference information,and difficult terrain landing.A model-based reinforcement learning guidance method based on guided policy search(GPS)is proposed to this issue,which realizes that when the lander is disturbed in the initial state,there is no need to re-plan,and it can still fall to the specified condition under constraints.In this method,the iterative linear quadratic regulator is used as the controller to generate the initial trajectory;secondly,a multi-layer neural network is used to fit the guidance policy;finally,the controller supervises the policy learning and then converges to generate a feasible policy.This paper takes the soft landing of the planet surface as an example for simulation verification.The simulation results show that the algorithm can achieve soft landing rapidly with the changed initial state only through a few training.On the one hand,it shows the efficient use of data based on model-based reinforcement learning;on the other hand,it also proves that the reinforcement learning method has broad application prospects in the field of deep space exploration.
作者
张阳康
孙晨
泮斌峰
ZHANG Yangkang;SUN Chen;PAN Binfeng(School of Astronautics,Northwestern Polytechnical University,Xi an,710072;National Key Laboratory of Aerospace Flight Dynamics,Xi an,710072)
出处
《飞控与探测》
2021年第5期34-43,共10页
Flight Control & Detection
基金
装备预研实验室基金(6142210200312)。
关键词
迭代线性二次调节器
引导策略搜索
有模型强化学习
行星软着陆
iterative Linear Quadratic Regulator(iLQR)
Guided Policy Search(GPS)
model-based reinforcement learning
planetary soft landing