摘要
深度强化学习(DRL)已被成功应用于移动机器人路径规划中,基于DRL的移动机器人路径规划算法适用于高维环境,是实现移动机器人自主学习的重要方法。而训练DRL模型需要大量的环境交互经验,这意味着更高的计算成本。此外,DRL算法的经验池容量有限,无法确保经验的有效利用。作为类脑计算重要工具之一的脉冲神经网络(Spiking Neural Networks,SNNs)以其独有的生物似真性,能同时融入时空信息,适用于机器人环境感知及控制。结合SNNs、卷积神经网络(CNNs)和策略融合,针对基于DRL的移动机器人路径规划算法进行研究,完成了以下工作:1)提出SCDDPG(SCDDP)算法。该算法利用CNNs对输入状态进行多通道特征提取,利用SNNs对提取的特征进行时空学习。2)在SCDDPG的基础上,提出SC2DDPG(SC2DDPG)算法。SC2DDPG通过设计状态约束策略对机器人运行状态进行约束,避免了不必要的环境探索,提升了SC2DDPG中DRL的收敛速度。3)在SCDDPG的基础上,提出了PFTDDPG(Policy Fusion and Transfer SCDDPG,PFTDDPG)算法。该算法采用分阶控制模式与DRL算法融合,针对环境中的楔形障碍物实施沿墙行走策略,并引入迁移学习对先验知识进行策略迁移。PFTDDPG算法不仅完成了单纯依靠RL不能完成的路径规划任务,还可以得到最优无碰路径。此外PFTDDPG提升了模型的收敛速度和路径规划性能。实验结果证明了所提出的3种路径规划算法的有效性,对比实验结果表明:在SpikeDDPG,SCDDPG,SC2DDPG和PFTDDPG算法中,PFTDDPG算法在路径规划成功率、训练收敛速度、规划路径长度等性能指标上表现最佳。本工作为移动机器人路径规划提出了新思路,丰富了DRL在移动机器人路径规划中的解决方案。
Deep reinforcement learning(DRL)has been applied to mobile robots’path planning successfully,and the DRL-based mobile robots’path planning methods are suitable for high-dimensional environments and stand as a crucial method for achieving autonomous learning in mobile robots.However,training DRL models requires a large amount of interacting experience with the environment,which leads to heavy computational cost.In addition,the limited memory capacity within DRL algorithms hinders the assurances of effective utilization of experiences.Spiking neural networks(SNNs),one of the main tools for brain-inspired computing,are suitable for robots’environmental perception and control with SNNs’unique bio-plausibility and the ability of incorporating spatio-temporal information simultaneously.In this paper,we combine SNNs,convolutional neural networks(CNNs),and policy fusion for DRL-based mobile robots’path planning,and have accomplished the following works:1)We propose the SCDDPG(spike convolutional DDPG,SCDDP)algorithm,which employs CNNs for multi-channel feature extraction of input states and SNNs for spatio-temporal features extracting.2)Based on SCDDPG and the designed state constraint policy,the SC2DDPG(State Constraint SCDDPG,SC2DDPG)algorithm is proposed to constrain the robot’s operation states,which avoids unnecessary environment exploration and improves the convergence speed of DRL model in SC2DDPG.3)Based on SCDDPG,the PFTDDPG(policy fusion and transfer SCDDPG,PFTDDPG)algorithm is proposed.The PFTDDPG implements the“wall-follow”policy to pass the wedge-shaped obstacles in the environment.Additionally,PFTDDPG incorporates transfer learning to transfer prior knowledge between policies in mobile robots’path planning.PFTDDPG not only completes path planning tasks that cannot be completed solely by RL,but also yields the optimal collision-free paths.Furthermore,PFTDDPG improves the convergence speed of the DRL model and the performance of the planed path.Experimental results validate the effectiveness
作者
安阳
王秀青
赵明华
AN Yang;WANG Xiuqing;ZHAO Minghua(College of Computer and Cyber Security,Hebei Normal University,Shijiazhuang 050024,China;Hebei Provincial Key Laboratory of Network&Information Security,Shijiazhuang 050024,China;Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics&Data Security,Shijiazhuang 050024,China)
出处
《计算机科学》
CSCD
北大核心
2024年第S02期59-69,共11页
Computer Science
基金
国家自然科学基金面上项目(61673160,61175059)
河北省自然科学基金(F2018205102)
河北省高等学校科学技术研究重点项目(ZD2021063)
关键词
深度强化学习
脉冲神经网络
卷积神经网络
迁移学习
移动机器人路径规划
Deep reinforcement learning
Spiking neural networks
Convolutional neural networks
Transfer learning
Mobile robot path planning