期刊文献+

深度强化学习在Atari视频游戏上的应用 被引量:3

The Application of Depth of reinforcement Learning in the Vedio Game
下载PDF
导出
摘要 考虑到深度学习在图像特征提取上的优势,为了提高深度学习在Atari游戏上的稳定性,在卷积神经网络和强化学习改进的Q-learning算法相结合的基础上,提出了一种基于模型融合的深度神经网络结构。实验表明,新的模型能够充分学习到控制策略,并且在Atari游戏上达到或者超出普通深度强化学习模型的得分,验证了模型融合的深度强化学习在视频游戏上的稳定性和优越性。 Considering the advantage of depth learning in image feature extraction, In order to improve the depth study on the Atari game per- formance this paper proposes a depth neural network structure based on model fusion, convolution neural network and modified Q-learning algo- rithm.Experiments show that the new model can fully study the control strategy, and it achieve or exceed the scores of the general learning model in the Atari game.Proving the deep reinforcement learning based on model fusion have the stability and superiority in the video game.
作者 石征锦 王康
出处 《电子世界》 2017年第16期105-106,109,共3页 Electronics World
关键词 强化学习 深度学习 神经网络 视频游戏 reinforcement learning deep learning: neural network: vedio game
  • 相关文献

参考文献1

二级参考文献120

  • 1MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-levelcontrol through deep reinforcement learning [J]. Nature, 2015,518(7540): 529 – 533. 被引量:1
  • 2SILVER D, HUANG A, MADDISON C, et al. Mastering the gameof Go with deep neural networks and tree search [J]. Nature, 2016,529(7587): 484 – 489. 被引量:1
  • 3AREL I. Deep reinforcement learning as foundation for artificialgeneral intelligence [M] //Theoretical Foundations of Artificial GeneralIntelligence. Amsterdam: Atlantis Press, 2012: 89 – 102. 被引量:1
  • 4TEAAURO G. TD-Gammon, a self-teaching backgammon program,achieves master-level play [J]. Neural Computation, 1994,6(2): 215 – 219. 被引量:1
  • 5SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge MA: MIT Press, 1998. 被引量:1
  • 6KEARNS M, SINGH S. Near-optimal reinforcement learning inpolynomial time [J]. Machine Learning, 2002, 49(2/3): 209 – 232. 被引量:1
  • 7KOCSIS L, SZEPESVARI C. Bandit based Monte-Carlo planning[C] //Proceedings of the European Conference on MachineLearning. Berlin: Springer, 2006: 282 – 293. 被引量:1
  • 8LITTMAN M L. Reinforcement learning improves behaviour fromevaluative feedback [J]. Nature, 2015, 521(7553): 445 – 451. 被引量:1
  • 9BELLMAN R. Dynamic programming and Lagrange multipliers[J]. Proceedings of the National Academy of Sciences, 1956,42(10): 767 – 769. 被引量:1
  • 10WERBOS P J. Advanced forecasting methods for global crisis warningand models of intelligence [J]. General Systems Yearbook, 1977,22(12): 25 – 38. 被引量:1

共引文献130

同被引文献14

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部