摘要
为解决传统的Q学习算法用于无人车路径规划时,存在规划效率低和收敛速度慢等问题,为此,提出一种基于改进Q学习算法的无人物流配送车路径规划算法。借鉴模拟退火算法的能量迭代原理,对贪婪因子ε进行调整,使其在训练过程中动态变化,以平衡探索与利用之间的关系,提高规划效率。将奖励机制中的奖励值由离散值变为连续值,并使其随着无人物流配送车与目标点的欧式距离减小而增大,让目标点牵引无人物流配送车移动以加快算法收敛速度。在两种不同的环境下对改进的Q学习算法进行仿真实验,结果表明:改进后的Q学习算法可以高效地规划出一条从起始点至目标点的路径,步数为34步,优于对比算法的路径质量。通过改变道路环境,验证了改进Q学习算法对不同环境的适应性,规划效率和收敛速度依然优于传统Q学习算法。
To solve the traditional Q-learning algorithm for unmanned vehicle path planning suffers from the problems of low planning efficiency and slow convergence speed,for this reason,a path planning algorithm for unmanned delivery vehicles based on the improved Q-learning algorithm is proposed.Learning from the energy iteration principle of the simulated annealing algorithm,adjusts the greedy factorεto make it change dynamically during the training process,so as to balance the relationship between exploration and utilization,and thus improve the planning efficiency.The reward value in the reward mechanism is changed from a discrete value to a continuous value,and it increases as the European distance between the unmanned delivery vehicle and the target point decreases,so that the target point can pull the unmanned delivery vehicle to move and accelerate the convergence speed of the algorithm.The improved Q-learning algorithm is simulated in two different environments,the simulation results show that the improved Q-learning algorithm can efficiently plan a path from the starting point to the target point with 34 steps,which is better path quality than comparison algorithms.The adaptability of the improved Q-learning algorithm to different environments is verified by changing the road environment,and the planning efficiency and convergence speed are still better than the traditional Q-learning algorithm.
作者
王小康
冀杰
刘洋
贺庆
Wang Xiaokang;Ji Jie;Liu Yang;He Qing(College of Engineering and Technology,Southwest University,Chongqing 400715,China)
出处
《系统仿真学报》
CAS
CSCD
北大核心
2024年第5期1211-1221,共11页
Journal of System Simulation
基金
重庆市科学技术局农业农村领域重点研发计划(cstc2021jscx-gksbX0003)
重庆市教育委员会科学技术研究项目(KJZDM202201302)
重庆市博士后研究项目(2021XM3070)。
关键词
Q学习
路径规划
收敛速度
规划效率
路径质量
Q-learning
path planning
convergence speed
planning efficiency
path quality