期刊文献+

激励学习的最优判据研究 被引量:8

Research on Optimality Criteria in Reinforcement Learning
下载PDF
导出
摘要 激励学习智能体通过最优策略的学习与规划来求解序贯决策问题 ,因此如何定义策略的最优判据是激励学习研究的核心问题之一。本文讨论了一系列来自动态规划的最优判据 ,通过实例检验了各种判据对激励学习的适用性和优缺点 。 RL agents solve sequential decision problems by learning optim policies for choosing actions.Thus,at the core of RL is the definition of what it means for a policy to be “optimal”.In this paper,a variety of optimality criteria from the dynamic programming literature are discussed,and their suitability and characteristics for RL is examined through some examples.The necessity of devising RL algorithms for the various criteria has also been analyzed.
出处 《计算机工程与科学》 CSCD 2001年第2期62-65,共4页 Computer Engineering & Science
关键词 激励学习 智能体 最优判据 学习算法 人工智能 reinforcement learning Markov decision process agent
  • 相关文献

参考文献1

  • 1Zhang W,Proc of the 14th IJCAI,1995年,1114页 被引量:1

同被引文献36

  • 1Bertsekas D P 李人厚(译).动态规划-确定和随机模型[M].西安:西安交通大学学报,1990.. 被引量:1
  • 2Sutton R S,Barto A G.Reinforcement Learning:An Introduction[M].MA:MIT Press,1998 被引量:1
  • 3Watkins C J C H,Dayan P.Q-learning[J].Machine Learning,1992;8(3):279~292 被引量:1
  • 4Sutton R S.Learning to predict by the method of temporal difference[J].Machine Learning,1988 ;3 (1) :9~44 被引量:1
  • 5Peng J,Williams R.Incremental multi-step Q-learning[J].Machine Learning,1996 ;22(4) :283~290 被引量:1
  • 6Watkins C J C H.Leaming from delayed rewarfs[D].University of Cambridge,England,1989 被引量:1
  • 7Wiering M,Schmidhuber J.Speeding up Q-learnind[C].In:Proc of the 10 European Conf on Machine Learning,1998 被引量:1
  • 8Sutton R S.Open theoretical questions in reinforcement learning[C].In:Proc of EuroCOLT'99(Computational Learning Theory),Cambridge,A:MIT Press,1999:11~17 被引量:1
  • 9Singh S.Reinforcement Learning Algorithm for Average-Payoff Mar~kovian Decision Processes[C].In:Proc of the 12' AAAI,1994 被引量:1
  • 10Sutton R S,Barto A G.Reinforcement Learning:An introduction[M].MA: MIT Press, 1998 被引量:1

引证文献8

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部