摘要
强化学习通过试错与环境交互获得策略的改进,其自学习和在线学习的特点使其成为机器学习研究的一个重要分支。但是,强化学习一直被"维数灾"问题所困扰。近年来,分层强化学习方法引入抽象(Abstraction)机制,在克服"维数灾"方面取得了显著进展。作为理论基础,本文首先介绍了强化学习的基本原理及基于半马氏过程的Q-学习算法,然后介绍了3种典型的单Agent分层强化学习方法(Option、HAM和MAXQ)的基本思想,Q-学习更新公式,概括了各方法的本质特征,并对这3种方法进行了对比分析评价。最后指出了将单Agent分层强化学习方法拓展到多Agent分层强化学习时需要解决的问题。
Reinforcement learning is an approach that an agent can learn its behaviors through trial-anderror interaction with a dynamic environment. It has been an important branch of machine learning for its self-learning and online learning capabilities. But reinforcement learning is bedeviled by the curse of dimensionality. Recently, hierarchical reinforcement learning has made great progresses in combatting with the curse of dimensionality by employing abstraction. As theoretical basis, the principle of reinforcement learning and Q-learning based on Semi-Markov Decision Process (SMDP) are introduced at first. Then, three typical single-agent hierarchical reinforcement learning approaches, namely, Option, HAM, and MAXQ, are reviewed, including their main ideas, Q-learning update formulas, commentaries, and the comparisons among them. At last, the open challenges in the process of the single-agent hierarchical reinforcement learning approaches being extended to multi-agent system are discussed.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2005年第5期574-581,共8页
Pattern Recognition and Artificial Intelligence
关键词
分层强化学习
半马氏过程
Q-学习
多智能体系统
Hierarchical Reinforcement Learning, Semi-Markov Decision Process, Q-Learning, Multi-Agent System