摘要
$options是一种与SMDP模型紧密相关的引入时间抽象的强化学习算法!该算法一个重要且仍待解决的问题是如何能使agent自主找到合适的options.本文首先提出了一种基于访问落差变化率的子目标找寻算法,该算法克服了现有算法的低精确性和部分依赖人为因素的弊病,然后在该算法的基础上,提出了构造options的算法流程,并把这一算法运用于迷宫问题之中。实验结果表明利用实验生成的options可以大大加快学习的效率。
The classical option algorithm provides a natural way of incorporating macro actions into Semi-Markov Decision Process (SMDP) framework. However it immediately raises the question of how to recognise appropriate options automatically. This paper presents a method based on the slope of frequenly curve to find sub-goals. Options can be automatically built based on sub-goals found in the previous step. This algorithm overcomes the shortcomings of previous methods such as low accuraly and artificial participation. We illustrated this algorithm with several grid-world navigation tasks. It is proved that the use of the options improve learning efficiency obviously.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2005年第6期679-684,共6页
Pattern Recognition and Artificial Intelligence
基金
国家自然科学基金(No.60103012
60475026)
国家重点研究发展规划973(No.2002CB312002)
江苏省自然科学基金(No.BK20034079)
江苏省创新人才计划(No.BK2003409)资助项目
关键词
强化学习
马尔可夫决策过程
抉择
半马尔可夫决策过程
子目标
Reinforcement Learning , Markov Decision Processes , Options , Semi- Markov Decision Processes, Subgoals