摘要
带时间窗的旅行商问题(traveling salesman problem with time window, TSPTW)是旅行商问题的一个变种,在物资配送等方面有大量的应用。传统方法的求解时间较长且泛化性较差,为提高TSPTW的求解效率,将求解过程建模为马尔科夫决策过程,定义了状态、动作、奖励,提出了一种基于深度强化学习的Transformer加指针网络的组合模型,通过多头注意力对输入的特征进行编码,采用指针网络求出解的概率分布,所提深度学习网络通过强化学习算法进行训练。实验结果表明:所提方法对比传统的启发式求解算法,可以得到更高质量的解,相较于求解器和启发式算法,有超过数10倍的提升效果,且易于将模型拓展到不同规模的问题上。
The Traveling Salesman Problem with Time Window(TSPTW),widely applied in material distribution,is a variant of the traveling salesman problem.To remedy such problems as long solution time and poor generalization of the traditional method as well as to to improve the solution efficiency of TSPTW,this paper models the solution process as a Markov decision process,defines the state,action and reward,and proposes a deep reinforcement learning based Transformer+pointer network model,which encodes the input features through multi-head attention,and employs the pointer network to work out the probability distribution of the solution.The deep learning network is trained by reinforcement learning algorithm.The experimental results show the proposed method obtains higher quality solutions compared with the traditional heuristic algorithms.Moreover,it markedly improves the final results and easily transfers the model to other problems of different scales compared with solvers and traditional heuristic algorithms.
作者
江明
刘志威
JIANG Ming;LIU Zhiwei(School of Internet Economics and Business,Fujian University of Technology,Fuzhou 350118,China;School of Transportation,Fujian University of Technology,Fuzhou 350118,China)
出处
《重庆理工大学学报(自然科学)》
CAS
北大核心
2023年第12期260-266,共7页
Journal of Chongqing University of Technology:Natural Science
基金
国家社会科学基金项目(22BGL007)
福建省社会科学基金项目(FJ2020B038)
福建省习近平新时代中国特色社会主义思想研究中心项目(GY-S21118)。
关键词
带时间窗的旅行商
深度强化学习
组合优化
注意力机制
traveling salesman with time window
deep reinforcement learning
combined optimization
attention mechanism