摘要
面向小行星探测任务的需要,柔性连接的多节点深空探测器是针对单节点探测器着陆易倾覆或反弹等问题的一种解决方案。基于此构建了一种采用柔性连接的三节点探测器并对其软着陆情况进行建模,提出了带自注意力机制的多任务深度强化学习方法。各节点以探测器主体为参照物描述自身状态,节点之间通过联合学习来提高各自对复杂环境的适应能力;在对探测器和障碍物进行特征提取时,采用注意力机制来提高对自己任务的关注,学习出更优的策略,从而获得最大的奖励。通过与其他方法的实验结果对比,证明了提出的方法更有利于探测器的稳定着陆。
The deep space probe with flexible-connected multiple nodes is probably a solution to the possible overturn or rebound in single-node probe landing on an asteroid. Therefore, we construct a probe with flexible-connected three nodes, model the soft landing process, and propose a multi-task deep reinforcement learning method with self-attention mechanism. Each node’s state is described referring to the probe base. Furthermore, joint learning among nodes is used to improve their adaptability. At the same time, the self-attention is applied to make the nodes focus on their own tasks and learn better strategies to obtain higher rewards for feature extraction of the probe and obstacles. Experimental results show that the method proposed in this paper is more beneficial to the stable landing of the probe compared with other methods.
作者
王鑫
赵清杰
于重重
张长春
陈涌泉
WANG Xin;ZHAO Qing-jie;YU Chong-chong;ZHANG Chang-chun;CHEN Yong-quan(School of Computer Science,Beijing Institute of Technology,Beijing 100081,China;School of Artificial Intelligence,Beijing Technology and Business University,Beijing 100048,China)
出处
《宇航学报》
EI
CAS
CSCD
北大核心
2022年第3期366-373,共8页
Journal of Astronautics
基金
国家重点研发计划(2019YFA0706500)。
关键词
深空探测器
软着陆
深度强化学习
多任务学习
自注意力机制
Deep space probe
Soft landing
Deep reinforcement learning
Multi-task learning
Self-attention mechanism