期刊文献+

基于SMDP环境的自主生成options算法的研究 被引量:9

The Study of Recognizing Options Based on SMDP
原文传递
导出
摘要 $options是一种与SMDP模型紧密相关的引入时间抽象的强化学习算法!该算法一个重要且仍待解决的问题是如何能使agent自主找到合适的options.本文首先提出了一种基于访问落差变化率的子目标找寻算法,该算法克服了现有算法的低精确性和部分依赖人为因素的弊病,然后在该算法的基础上,提出了构造options的算法流程,并把这一算法运用于迷宫问题之中。实验结果表明利用实验生成的options可以大大加快学习的效率。 The classical option algorithm provides a natural way of incorporating macro actions into Semi-Markov Decision Process (SMDP) framework. However it immediately raises the question of how to recognise appropriate options automatically. This paper presents a method based on the slope of frequenly curve to find sub-goals. Options can be automatically built based on sub-goals found in the previous step. This algorithm overcomes the shortcomings of previous methods such as low accuraly and artificial participation. We illustrated this algorithm with several grid-world navigation tasks. It is proved that the use of the options improve learning efficiency obviously.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2005年第6期679-684,共6页 Pattern Recognition and Artificial Intelligence
基金 国家自然科学基金(No.60103012 60475026) 国家重点研究发展规划973(No.2002CB312002) 江苏省自然科学基金(No.BK20034079) 江苏省创新人才计划(No.BK2003409)资助项目
关键词 强化学习 马尔可夫决策过程 抉择 半马尔可夫决策过程 子目标 Reinforcement Learning , Markov Decision Processes , Options , Semi- Markov Decision Processes, Subgoals
  • 相关文献

参考文献13

  • 1Bernstein D S. Reusing Old Policies to Accelerate Learning on New MDPs. Technical Report, UM-CS-1999-026, Department of Computer Science, University of Massachusetts, Amherst,USA, 1999. 被引量:1
  • 2Sutton R, Precup D, Singh S. Between MDPs 0nd Semi-MDPs:A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence, 1999, 112(1-2): 181-211. 被引量:1
  • 3Stolle M, Precup D. Learning Options in Reinforcement Learning. In: Proc of the 5th International Symposium on Abstraction, Reformulation and Approximation. Kananaskis, Canada,2002, 212-223. 被引量:1
  • 4Kaelbling I. P, Littman M L, Moore A W. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research,1996, 4:237-285. 被引量:1
  • 5Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, USA: MIT Press. 1998. 被引量:1
  • 6Iba G A. A Heuristic Approach to the Discovery of Macro-Operators. Machine Learning, 1989, 3(4): 285-317. 被引量:1
  • 7Precup D. Temporal Abstraction in Reinforcement Learning.Ph. D Dissertation. University of Massachusetts, Amherst,USA, 2000. 被引量:1
  • 8Digney B. Learning Hierarchical Control Structure for Multiple Tasks and Changing Environments. In: Proc of the 5th Conference on Simulation of Adaptive Behavior. Cambridge, USA:MIT Press, 1998. http://www.ri.cmu.edu/pubs/pub_3150.html. 被引量:1
  • 9Digney B. Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments. In: Meas P, Mataric M, eds. Proc of the 4th Conferenceon Simulation of Adaptive Behavior. Cambridge, USA: MIT Press, 1996. http://www.ri.cmu.edu/pubs/pub_3151.html. 被引量:1
  • 10Thrun S, Schwartz A. Finding Structure in Reinforcement Learning. In: Tesauro G, Touretzky D, Leen T, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 1995, 385-392. 被引量:1

同被引文献127

引证文献9

二级引证文献81

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部