期刊文献+

基于多智能体强化学习的多AGV路径规划方法 被引量:10

Multi-AGV Path Planning Method Based on Multi-agent Reinforcement Learning
下载PDF
导出
摘要 AGV(automated guided vehicle)路径规划问题已成为货物运输、快递分拣等领域中一项关键技术问题。由于在此类场景中需要较多的AGV合作完成,传统的规划模型难以协调多AGV之间的相互作用,采用分而治之的思想或许能获得系统的最优性能。基于此,该文提出一种最大回报频率的多智能体独立强化学习MRF(maximum reward frequency)Q-learning算法,对任务调度和路径规划同时进行优化。在学习阶段AGV不需要知道其他AGV的动作,减轻了联合动作引起的维数灾问题。采用Boltzmann与ε-greedy结合策略,避免收敛到较差路径,另外算法提出采用获得全局最大累积回报的频率作用于Q值更新公式,最大化多AGV的全局累积回报。仿真实验表明,该算法能够收敛到最优解,以最短的时间步长完成路径规划任务。 The AGV(automated guided vehicle) path planning problem has become a key technical problem in the field such as cargo transportation and express distribution. In this case,multi-AGV are required to cooperate to complete the task,it is difficult for traditional planning models to coordinate the interactions between multi-AGV. Using the divide-and-conquer idea may not necessarily achieve the optimal performance of the system. Therefore,a multi-agent independent reinforcement learning MRF(maximum reward frequency) Q-learning algorithm is proposed to optimize task scheduling and path planning at the same time. The AGV does not need to know the actions of other AGVs during the learning phase,which reduces the problem of dimensional disasters caused by joint actions. A combination strategy of Boltzmann and ε-greedy is used to avoid convergence to a poor path. In addition,the algorithm proposes to use the frequency of the global maximum return to update the Q value formula,so that multi-AGV can maximize the global cumulative return. Simulation experiments show that the algorithm can converge to the optimal solution and complete the path planning task in the shortest time steps.
作者 刘辉 肖克 王京擘 LIU Hui;XIAO Ke;WANG Jing-bo(Department of Automation,Qingdao University,Qingdao 266071,China)
出处 《自动化与仪表》 2020年第2期84-89,共6页 Automation & Instrumentation
基金 山东省自然科学基金项目(ZR2017PF005) 青岛市博士后应用研究项目
关键词 多智能体强化学习 AGV路径规划 独立强化学习 multi-agent reinforcement learning AGV path planning independent reinforcement learning
  • 相关文献

参考文献4

二级参考文献32

  • 1李仁府,独孤明哲,胡麟.基于PSO算法的路径规划收敛性与参数分析[J].华中科技大学学报(自然科学版),2013,41(S1):271-275. 被引量:7
  • 2戴博,肖晓明,蔡自兴.移动机器人路径规划技术的研究现状与展望[J].控制工程,2005,12(3):198-202. 被引量:75
  • 3PARKER L E. Multiple mobile robot systems [ M]//Springer Hand- book of Robotics. Berlin: Springer, 2005:921-941. 被引量:1
  • 4CHARKROBORTY J, MUKHOPADHYAY S. A robust cooperative multi-robot path-planning in noisy environment [ C]// Proceedings of the 2010 IEEE International Conference on Industrial and Infor- mation Systems. Piscataway: IEEE, 2010:626-631. 被引量:1
  • 5JARADAT M, GARIBEH M H, FEILAT E A. Dynamic motion plan- ning for autonomous mobile robot using fuzzy potential field [ C]// Proceedings of the 6tb International Symposium on Meehatronies and Its Applications. Piseataway: IEEE, 2009:24-26. 被引量:1
  • 6GHATEE M, MOHADES A. Motion planning in order to optimize the length and clearance applying a Hopfield neural network [ J]. Expert Systems with Applications, 2009, 36(3): 4688 -4695. 被引量:1
  • 7BARTO A G, MAHADEVEN S. Recent advance in hierarchical reinforcement learning [ J]. Discrete Event Dynamic Systems, 2003, 13(4): 341 -379. 被引量:1
  • 8SABATFIN L, SECCHI C, FANTUZZI C. Arbitrarily shaped for- mations of mobile robots: artificial potential fields and coordinate transformation [ J]. Autonomous Robots, 2011, 30 (4) : 385 - 397. 被引量:1
  • 9KHATIB O. Real-time obstacle avoidance for manipulators and mo- bile robots [ C]//Proceedings of the 1985 IEEE International Con- ference on Robotics and Automation. Piseataway: IEEE, 1985, 2: 500 - 505. 被引量:1
  • 10LIANG T. A speedup convergent method for multi-Agent reinforce- ment learning [ C]// Proceedings of the 2009 International Confer- ence on Information Engineering and Computer Science. Piscat- away: IEEE, 2009:1-4. 被引量:1

共引文献356

同被引文献89

引证文献10

二级引证文献59

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部