摘要
针对动态在线任务分配策略难以有效利用历史数据进行学习、同时未考虑当前决策对未来收益的影响的问题,提出基于深度强化学习的空间众包任务分配策略.首先,以最大化长期累积收益为优化目标,基于马尔科夫决策过程从单个众包工作者的角度建模,将任务分配问题转化为对状态动作价值Q的求解及工作者与任务的一对一分配.然后采用改进的深度强化学习算法对历史任务数据进行离线学习,构建关于Q值的预测模型.最后,动态在线分配过程中实时预测Q值,作为KM(Kuhn-Munkres)算法的边权,实现全局累积收益的最优分配.在出租车真实出行数据集上的实验表明,当工作者数量在一定规模内时,文中策略可提高长期累积收益.
In the traditional dynamic online task allocation strategy,it is difficult to effectively make use of historical data for learning and the impact of current decisions on future revenue is not taken into account.Therefore,a task allocation strategy of spatial crowdsourcing based on deep reinforcement learning is proposed.Firstly,maximizing long-term cumulative income is regarded as an objective function and the task assignment problem is transformed into the solution of Q value of state action and the one-to-one distribution between workers and tasks by modeling from the perspective of a single crowdsourcing worker grounded on Markov decision process.Secondly,the improved deep reinforcement learning algorithm is applied to learn the historical task data offline to construct the prediction model with respect to Q value.Finally,Q value in real time gained by the model in the dynamic online distribution process is regarded as a side weight of KM algorithm.The optimal distribution of global cumulative returns can be achieved.The results of comparative experiment on the real taxi travel dataset show that the proposed strategy increases the long-term cumulative income while the number of workers is within a certain scale.
作者
倪志伟
刘浩
朱旭辉
赵杨
冉家敏
NI Zhiwei;LIU Hao;ZHU Xuhui;ZHAO Yang;RAN Jiamin(School of Management,Hefei University of Technology,Hefei 230009;Key Laboratory of Process Optimization and Intelligent Deci-sion-Making,Ministry of Education,Hefei University of Technology,Hefei 230009)
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2021年第3期191-205,共15页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金项目(No.91546108,71901001,71521001)
安徽省科技重大专项项目(No.201903a05020020)
安徽省自然科学基金项目(No.1908085QG298)资助。
关键词
空间众包
任务分配
多阶段序贯决策
深度强化学习
Spatial Crowdsourcing
Task Allocation
Multi-stage Sequential Decision-Making
Deep Reinforcement Learning