期刊文献+

分层强化学习研究综述 被引量:7

A Survey of Hierarchical Reinforcement Learning
原文传递
导出
摘要 强化学习通过试错与环境交互获得策略的改进,其自学习和在线学习的特点使其成为机器学习研究的一个重要分支。但是,强化学习一直被"维数灾"问题所困扰。近年来,分层强化学习方法引入抽象(Abstraction)机制,在克服"维数灾"方面取得了显著进展。作为理论基础,本文首先介绍了强化学习的基本原理及基于半马氏过程的Q-学习算法,然后介绍了3种典型的单Agent分层强化学习方法(Option、HAM和MAXQ)的基本思想,Q-学习更新公式,概括了各方法的本质特征,并对这3种方法进行了对比分析评价。最后指出了将单Agent分层强化学习方法拓展到多Agent分层强化学习时需要解决的问题。 Reinforcement learning is an approach that an agent can learn its behaviors through trial-anderror interaction with a dynamic environment. It has been an important branch of machine learning for its self-learning and online learning capabilities. But reinforcement learning is bedeviled by the curse of dimensionality. Recently, hierarchical reinforcement learning has made great progresses in combatting with the curse of dimensionality by employing abstraction. As theoretical basis, the principle of reinforcement learning and Q-learning based on Semi-Markov Decision Process (SMDP) are introduced at first. Then, three typical single-agent hierarchical reinforcement learning approaches, namely, Option, HAM, and MAXQ, are reviewed, including their main ideas, Q-learning update formulas, commentaries, and the comparisons among them. At last, the open challenges in the process of the single-agent hierarchical reinforcement learning approaches being extended to multi-agent system are discussed.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2005年第5期574-581,共8页 Pattern Recognition and Artificial Intelligence
关键词 分层强化学习 半马氏过程 Q-学习 多智能体系统 Hierarchical Reinforcement Learning, Semi-Markov Decision Process, Q-Learning, Multi-Agent System
  • 相关文献

参考文献46

  • 1高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报,2004,30(1):86-100. 被引量:270
  • 2Singh S P, Jaakola T, Jordan M I. Reinforcement Learning with Soft State Aggregation. In:Tesauro G, Touretzky D S, Leen T K, eds. Advances in Neural Information Processing Systems 7.Cambridge, USA:MIT Press, 1995, 361-368. 被引量:1
  • 3Moriarty D, Sehultz A, Grefenstette J. Evolutionary Algorithms for Reinforcement Learning. Journal of Artificial Intelligence Research, 1999, 11:241-276. 被引量:1
  • 4Bertsekas D P, Tsitsiklis J N. Neuro-Dynamic Programming.Belmont, USA: Athena Scientific, 1996. 被引量:1
  • 5Barto A G, Mahadevan S. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems:Theory and Applications, 2003, 13(4), 41-77. 被引量:1
  • 6Sutton R S, Precup D, Singh S P. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence, 1999, 112(1-2): 181-211. 被引量:1
  • 7Parr R. Hierarchical Control and Learning for Markov Decision Processes. Ph. D Dissertation. University of California, Berkeley, USA, 1998. 被引量:1
  • 8Dietterich T G. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. Journal of Artificial Intelligence Research, 2000, 13 : 227- 303. 被引量:1
  • 9Minsky M L. Theory of Neural-Analog Reinforcement Systems and Its Application to the Brain-Model Problem. Ph. D Dissertation. Princeton University, Princeton, USA, 1954. 被引量:1
  • 10张汝波编著..强化学习理论及应用[M].哈尔滨:哈尔滨工程大学出版社,2001:287.

二级参考文献52

  • 1Hewitt C. Viewing Control Ctructures as Patterns of Passing Messages. Artificial Intelligence, 1977,8(3) :323-364 被引量:1
  • 2Wooldridge M,Jennings N R. Agent Theories,Architectures,and Languages: a Survey. In: Wooldridge, Jennings, eds. Intelligent Agents,Berlin: Springer-Verlag, 1995. 1-22. 被引量:1
  • 3Wei β G. Learning to Coordinate Actions in Multi-Agent Systems Proceedings of IJCAI'93, 1993 被引量:1
  • 4Dworman,Garett,Kimbrough S,Laing J. Bargaining by Artificial Agents in Two Coalition Games: A Study in Genetic Programming for Electronic Commerce. In: Proc. of the AAAI Genetic Programming Conf. Stanford,CA,Aug. 1996 被引量:1
  • 5Kaelbling L P. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research, 1996,4: 237-285 被引量:1
  • 6Singh S. Agents and Reinforcement Learning. Miller freeman publish Inc,San Mateo,CA,USA,1997 被引量:1
  • 7Bellman R. Dynamic Programming. Prentice-Hall, Englewood Cliffs, NJ, 1957 被引量:1
  • 8Sutton R S. Learning to predict by the methods of temporal differences. Machine Learning, 1988,3: 9 - 44 被引量:1
  • 9Sutton R S. Convergence theory for a new kind of prediction learning. In:Proc. of the 1988 Workshop on Computational Learning Theory, 1988. 421-442 被引量:1
  • 10Watkins C J C H,Dayan P. Q-Learning. Machine Learning,8(3):279-292 被引量:1

共引文献275

同被引文献87

引证文献7

二级引证文献35

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部