摘要
AGV(automated guided vehicle)路径规划问题已成为货物运输、快递分拣等领域中一项关键技术问题。由于在此类场景中需要较多的AGV合作完成,传统的规划模型难以协调多AGV之间的相互作用,采用分而治之的思想或许能获得系统的最优性能。基于此,该文提出一种最大回报频率的多智能体独立强化学习MRF(maximum reward frequency)Q-learning算法,对任务调度和路径规划同时进行优化。在学习阶段AGV不需要知道其他AGV的动作,减轻了联合动作引起的维数灾问题。采用Boltzmann与ε-greedy结合策略,避免收敛到较差路径,另外算法提出采用获得全局最大累积回报的频率作用于Q值更新公式,最大化多AGV的全局累积回报。仿真实验表明,该算法能够收敛到最优解,以最短的时间步长完成路径规划任务。
The AGV(automated guided vehicle) path planning problem has become a key technical problem in the field such as cargo transportation and express distribution. In this case,multi-AGV are required to cooperate to complete the task,it is difficult for traditional planning models to coordinate the interactions between multi-AGV. Using the divide-and-conquer idea may not necessarily achieve the optimal performance of the system. Therefore,a multi-agent independent reinforcement learning MRF(maximum reward frequency) Q-learning algorithm is proposed to optimize task scheduling and path planning at the same time. The AGV does not need to know the actions of other AGVs during the learning phase,which reduces the problem of dimensional disasters caused by joint actions. A combination strategy of Boltzmann and ε-greedy is used to avoid convergence to a poor path. In addition,the algorithm proposes to use the frequency of the global maximum return to update the Q value formula,so that multi-AGV can maximize the global cumulative return. Simulation experiments show that the algorithm can converge to the optimal solution and complete the path planning task in the shortest time steps.
作者
刘辉
肖克
王京擘
LIU Hui;XIAO Ke;WANG Jing-bo(Department of Automation,Qingdao University,Qingdao 266071,China)
出处
《自动化与仪表》
2020年第2期84-89,共6页
Automation & Instrumentation
基金
山东省自然科学基金项目(ZR2017PF005)
青岛市博士后应用研究项目
关键词
多智能体强化学习
AGV路径规划
独立强化学习
multi-agent reinforcement learning
AGV path planning
independent reinforcement learning