期刊文献+

一种基于广义异步值迭代的规划网络模型

Planning Network Model Based on Generalized Asynchronous Value Iteration
下载PDF
导出
摘要 近年来,如何生成具有泛化能力的策略已成为深度强化学习领域的热点问题之一,并涌现出了许多相关的研究成果,其中的一个代表性工作为广义值迭代网络.广义值迭代网络是一种可作用于非规则图形的规划网络模型.它利用一种特殊的图形卷积算子来近似地表示状态转移矩阵,使得其在学习到非规则图形的结构信息后,可通过值迭代过程进行规划,从而在具有非规则图形结构的任务中产生具有泛化能力的策略.然而,由于没有考虑根据状态重要性来合理分配规划时间,广义值迭代网络中的每一轮迭代都需要在整个状态空间的所有状态上同步执行.当状态空间较大时,这样的同步更新会降低网络的规划性能.用异步更新的思想来进一步研究广义值迭代网络.通过在值迭代过程中定义状态优先级并执行异步值更新,提出了一种新型的异步规划网络模型——广义异步值迭代网络.在未知的非规则结构任务中,与广义值迭代网络相比,广义异步值迭代网络具有更高效且更有效的规划过程.进一步地,改进了广义值迭代网络中的强化学习算法及图形卷积算子,并通过在非规则图形和真实地图中的路径规划实验验证了改进方法的有效性. In recent years,how to generate policies with generalization abilities has become one of the hot issues in the field of deep reinforcement learning,and many related research achievements have appeared.One representative work among them is generalized value iteration network(GVIN).GVIN is a differential planning network that uses a special graph convolution operator to approximately represent a state-transition matrix,and uses the value iteration(VI)process to perform planning during the learning of structure information in irregular graphs,resulting in policies with generalization abilities.In GVIN,each round of VI involves performing value updates synchronously at all states over the entire state space.Since there is no consideration about how to rationally allocate the planning time according to the importance of states,synchronous updates may degrade the planning performance of network when the state space is large.This work applies the idea of asynchronous update to further study GVIN.By defining the priority of each state and performing asynchronous VI,a planning network is proposed,it is called generalized asynchronous value iteration network(GAVIN).In unknown tasks with irregular graph structure,compared with GVIN,GAVIN has a more efficient and effective planning process.Furthermore,this work improves the reinforcement learning algorithm and the graph convolutional operator in GVIN,and their effectiveness are verified by path planning experiments in irregular graphs and real maps.
作者 陈子璇 章宗长 潘致远 张琳婧 CHEN Zi-Xuan;ZHANG Zong-Zhang;PAN Zhi-Yuan;ZHANG Lin-Jing(State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,China;School of Computer Science and Technology,Soochow University,Suzhou 215006,China)
出处 《软件学报》 EI CSCD 北大核心 2021年第11期3496-3511,共16页 Journal of Software
基金 国家自然科学基金(61876119) 江苏省自然科学基金(BK20181432) 中央高校基本科研业务费专项资金(022114380010)。
关键词 深度学习 强化学习 模仿学习 规划 异步更新 deep learning reinforcement learning imitation learning planning asynchronous update
  • 相关文献

参考文献2

二级参考文献36

  • 1魏英姿 ,赵明扬 .一种基于强化学习的作业车间动态调度方法[J].自动化学报,2005,31(5):765-771. 被引量:19
  • 2高阳,周如益,王皓,曹志新.平均奖赏强化学习算法研究[J].计算机学报,2007,30(8):1372-1378. 被引量:38
  • 3BENGIO Y, DELALLEAU O. On the expressive power of deep archi- tectures[ C ]//Proc of the 14th International Conference on Discovery Science. Berlin : Springer-Verlag, 2011 : 18 - 36. 被引量:1
  • 4BENGIO Y. Leaming deep architectures for AI[ J]. Foundations and Trends in Machine Learning ,2009,2 ( 1 ) : 1-127. 被引量:1
  • 5HINTON G,OSINDERO S,TEH Y. A fast learning algorithm for deep belief nets [ J ]. Neural Computation ,2006,18 (7) : 1527-1554. 被引量:1
  • 6BENGIO Y, LAMBLIN P, POPOVICI D, et al. Greedy layer-wise training of deep networks [ C ]//Proc of the 12th Annual Conference on Neural Information Processing System. 2006:153-160. 被引量:1
  • 7LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning ap- plied to document recognition[ J]. Proceedings of the iEEE, 1998, 86( 11 ) :2278-2324. 被引量:1
  • 8VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders[ C ]//Proc of the 25th International Conference on Machine Learning. New York: ACM Press ,2008 : 1096-1103. 被引量:1
  • 9VINCENT P, LAROCHELLE H, LAJOIE I, et aL Stacked denoising autoencoders:learning useftd representations in a deep network with a local denoising criterion [ J ]. Journal of Machine Learning Re- search ,2010,11 ( 12 ) :3371-3408. 被引量:1
  • 10YU Dong, DENG Li. Deep convex net: a scalable architecture for speech pattern classification [ C]//Proc of the 12th Annual Confe-rence of International Speech Comunication Association. 2011 : 2285- 2288. 被引量:1

共引文献1104

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部