期刊文献+

记忆增强型深度强化学习研究综述 被引量:6

Survey on Memory-augmented Deep Reinforcement Learning
下载PDF
导出
摘要 近年来,深度强化学习的取得了飞速发展,为了提高深度强化学习处理高维状态空间或动态复杂环境的能力,研究者将记忆增强型神经网络引入到深度强化学习,并提出了不同的记忆增强型深度强化学习算法,记忆增强型深度强化学习已成为当前的研究热点.本文根据记忆增强型神经网络类型,将记忆增强型深度强化学习分为了4类:基于经验回放的深度强化学习、基于记忆网络的深度强化学习算法、基于情景记忆的深度强化学习算法、基于可微分计算机的深度强化学习.同时,系统性地总结和分析了记忆增强型深度强化学习的一系列研究成果存在的优势和不足.另外,给出了深度强化学习常用的训练环境.最后,对记忆增强型深度强化学习进行了展望,指出了未来研究方向. In recent years,deep reinforcement learning has developed rapidly.To improve the performance of deep reinforcement learning(DRL) in high-dimensional state space and dynamic complex environment,researchers introduce memory-augmented neural networks(MANN) into DRL,and propose various memory-augmented deep reinforcement learning(MADRL) algorithms,which becomes a research hotspot.In this paper according to the types of MANN,MADRL algorithms can be categorized into four classes:MADRL based on experience replay,MADRL based on memory network,MADRL based on episodic memory and MADRL based on differentiable neural computer.In addition,the training environments for DRL are introduced.Meanwhile,this paper systematically summarizes and analyzes the advantages and disadvantages of the research works on MADRL.Finally,the prospect and future research directions of MADRL are discussed.
作者 汪晨 曾凡玉 郭九霞 WANG Chen;ZENG Fan-yu;GUO Jiu-xia(School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China;College of Air Traffic Management,Civil Aviation Fight University of China,Guanghan 618307,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2021年第3期454-461,共8页 Journal of Chinese Computer Systems
基金 国家自然科学基金-联合基金项目(U181320052)资助 国家自然科学基金面上项目(6177020680)资助 国家自然科学基金青年科学基金项目(62003381)资助 国家重点研发计划项目(2018YFC0831801)资助 四川省重点研发项目(17ZDYF3184)资助.
关键词 深度强化学习 经验回放 记忆网络 情景记忆 可微分计算机 deep reinforcement learning experience replay memory networks episodic memory differentiable neural computer
  • 相关文献

参考文献4

二级参考文献128

  • 1魏英姿 ,赵明扬 .一种基于强化学习的作业车间动态调度方法[J].自动化学报,2005,31(5):765-771. 被引量:19
  • 2高阳,周如益,王皓,曹志新.平均奖赏强化学习算法研究[J].计算机学报,2007,30(8):1372-1378. 被引量:38
  • 3MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-levelcontrol through deep reinforcement learning [J]. Nature, 2015,518(7540): 529 – 533. 被引量:1
  • 4SILVER D, HUANG A, MADDISON C, et al. Mastering the gameof Go with deep neural networks and tree search [J]. Nature, 2016,529(7587): 484 – 489. 被引量:1
  • 5AREL I. Deep reinforcement learning as foundation for artificialgeneral intelligence [M] //Theoretical Foundations of Artificial GeneralIntelligence. Amsterdam: Atlantis Press, 2012: 89 – 102. 被引量:1
  • 6TEAAURO G. TD-Gammon, a self-teaching backgammon program,achieves master-level play [J]. Neural Computation, 1994,6(2): 215 – 219. 被引量:1
  • 7SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge MA: MIT Press, 1998. 被引量:1
  • 8KEARNS M, SINGH S. Near-optimal reinforcement learning inpolynomial time [J]. Machine Learning, 2002, 49(2/3): 209 – 232. 被引量:1
  • 9KOCSIS L, SZEPESVARI C. Bandit based Monte-Carlo planning[C] //Proceedings of the European Conference on MachineLearning. Berlin: Springer, 2006: 282 – 293. 被引量:1
  • 10LITTMAN M L. Reinforcement learning improves behaviour fromevaluative feedback [J]. Nature, 2015, 521(7553): 445 – 451. 被引量:1

共引文献642

同被引文献55

引证文献6

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部