期刊文献+

基于联合Q值分解的强化学习网约车订单派送 被引量:5

Reinforcement Learning Online Car-Hailing Order Dispatch Based on Joint Q-value Decomposition
下载PDF
导出
摘要 因网约车订单派送不合理,导致资源利用率和出行效率降低。基于联合Q值函数分解的框架,提出两种订单派送方法ODDRL和LF-ODDRL,高效地将用户订单请求派送给合适的网约车司机,尽可能缩短乘客等待时间。为捕获网约车订单派送场景中随机需求与供应动态变化关系,把城市定义为一张四边形网格的地图,将每辆车视为一个独立的智能体,构建多智能体马尔可夫决策过程模型,通过最大化熵与累计奖励训练智能体。将多智能体的联合Q值函数转化为易分解函数,使联合Q值函数与单个智能体值函数中的动作具有一致性,同时设计动作搜索函数,结合集中训练、分散执行策略的优点,让每辆车以分布式的方式解决订单匹配问题,而不需要与其他车辆进行协调,从而降低复杂性。实验结果表明,相比Random、Greedy、QMIX等方法,所提ODDRL和LF-ODDRL具有较优的扩展性,其中,在500×500网格上,当乘客数为10、车辆数为2时,相对于QMIX方法接送乘客所产生的总时间分别缩短5%和12%。 Resource utilization and travel efficiency are often reduced owing to an unreasonable dispatch of online carhailing orders.Based on the joint Q-value function decomposition framework,two order dispatch methods,ODDRL and LF-ODDRL,are proposed to efficiently dispatch user requests to appropriate online car-hailing drivers to minimize passenger waiting times.To capture the dynamic change relationship between random demand and supply in the online car-hailing order dispatch scenario,the city is defined as a quadrilateral grid map,and each vehicle is considered as an independent agent.A multi-agent Markov Decision Process(MDP)model is developed to train agents by optimizing entropy and cumulative rewards.The joint Q-value function of multi-agents is transformed into a decomposable function so that the actions in the joint Q-value function and the value function of a single agent are consistent.At the same time,the action search function is designed by combining the benefits of centralized training and decentralized execution strategy so that each vehicle can solve the order matching problem in a distributed manner without coordinating with other vehicles,thereby reducing complexity.The experimental results demonstrate that the proposed ODDRL and LFODDRL have better scalability than Random,Greedy,QMIX,and other methods.On the 500×500 grid,when the number of passengers is 10 and the number of vehicles is 2,the total time for picking up is shorten by 5%and 12%respectively,when compared to the QMIX method.
作者 黄晓辉 张雄 杨凯铭 熊李艳 HUANG Xiaohui;ZHANG Xiong;YANG Kaiming;XIONG Liyan(School of Information Engineering,East China Jiaotong University,Nanchang 330013,China)
出处 《计算机工程》 CAS CSCD 北大核心 2022年第12期296-303,311,共9页 Computer Engineering
基金 国家自然科学基金(62062033,62067002,61967006) 江西省自然科学基金青年重点项目(20192ACBL21006) 江西省自然科学基金面上项目(20212BAB202008)。
关键词 多智能体 强化学习 值函数 订单派送 神经网络 multi-agent reinforcement learning value function order dispatch neural network
  • 相关文献

参考文献2

二级参考文献4

共引文献12

同被引文献44

引证文献5

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部