期刊文献+

强化学习算法的稳定状态空间控制

Algorithm of stable state spaces in reinforcement learning
下载PDF
导出
摘要 强化学习算法的探索次数随着状态空间的增加呈指数增长,因此难以用于复杂系统的控制中。为克服这一问题,提出一种稳定状态空间控制的强化学习算法。算法以寻找稳定空间的最优控制动作为学习目标,将探索过程集中于稳定状态空间中,而不探索系统的全部状态空间。由于稳定状态空间通常仅占系统状态空间中的极小一部分,因此算法的探索次数不随状态空间的增加呈指数增长。 Reinforcement leaning often suffers from the fact that the number of trials grows exponentially as the state spaces expand. This paper proposed an algorithm of stable state spaces in reinforcement learning to overcome this problem. The algorithm aimed for optimal actions in stable state spaces and focused exploration areas on stable state spaces instead of the whole state spaces, As stable state spaces is only a small fraction of the whole state spaces, the number of trials in our algorithm does not grow exponentially as the state spaces expand.
出处 《计算机应用》 CSCD 北大核心 2008年第5期1328-1330,1343,共4页 journal of Computer Applications
基金 国家自然科学基金资助项目(60373029)
关键词 强化学习 马尔可夫决策过程 稳定状态 倒立摆 reinforcement learning Markov Decision Process (MDP) stable state inverted pendulum
  • 相关文献

参考文献10

  • 1SUTTON R,BARTO A.Reinforcement learning:An introduction[M].Cambridge,MA:MIT Press,1998. 被引量:1
  • 2MURAO H,KITAMURA S.Q-learning with adaptive state segmentation (QLASS)[C]// Proceedings of the 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation(CIRA'97).Washington,DC:IEEE Computer Society,1997:179-184. 被引量:1
  • 3张双民,石纯一.一种基于特征向量提取的FMDP模型求解方法[J].软件学报,2005,16(5):733-743. 被引量:3
  • 4MORALES E P.Relational state abstractions for reinforcement learning[C]// Proceedings of the 21st International Conference on Machine Learning(ICML 2004).New York,NY:ACM Press,2004:27-32. 被引量:1
  • 5MURATA M,OZAWA S.A reinforcement learning algorithm for a class of dynamical environments using neural networks[C]// SICE 2003 Annual Conference.Washington,DC:IEEE Computer Society,2003:2004-2009. 被引量:1
  • 6钱征,孙亮,阮晓钢.一种基于递归神经网络的自适应控制方法研究[J].微计算机信息,2005,21(11S):88-90. 被引量:3
  • 7SEKINO M,KATAGAMI D,NITTA K.State spaces self organization based on the interaction between basis functions[C]// Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS 2005).Washington,DC:IEEE Computer Society,2005:2929-2934. 被引量:1
  • 8ORMONEIT D,GLYNN P.Kernel-based reinforcement learning in average-cost problems[J].IEEE Transactions on Automatic Control,2002,47(10):1624-1636. 被引量:1
  • 9RUDOLPH P.Adaptive control of human posture using reinforcement learning[D].Cleveland,OH:Cleveland state university,2003. 被引量:1
  • 10MITCHELL T.Machine learning[M].Columbus,OH:McGraw-Hill,1997. 被引量:1

二级参考文献19

  • 1Parr K. Policy iteration for factored MDPs. In: Proc. of the 16th Conf. on Uncertainty in Artificial Intelligence (UAI00). Stanford,2000. 326-334. http://ai.stanford.edu/~koller/papers/uai00kp.html 被引量:1
  • 2Parr K. Computing factored value functions for policies in structured MDPs. In: Int'l Joint Conf. on Artificial Intelligence(IJCAI'99). Morgan Kaufmann Publishers, 1999.1332-1339. http://ai.stanford.edu/~koller/papers/ijcai99kp.html 被引量:1
  • 3de Farias R. Approximate dynamic programming via linear programming. In: Advances in Neural Information Processing Systems14. Cambridge: MIT Press, 2002. http://www.core.org.cn/NR/rdonlyres/Mechanical-Engineering/2-997Spring2004/DF5542A5-BBCC-4BAB-ADBF-41AB0FDA6F95/0/most_uhan_slides.pdf 被引量:1
  • 4Guestrin CE, Venkataraman S, Koller D. Context specific multiagent coordination and planning with factored MDPS. In:AAAI-2002 The 18th National Conf. on Artificial Intelligence. Edmonton, 2002. 253-259. http://www-2.cs.cmu.edu/~shobha/research/aaai02.pdf 被引量:1
  • 5Guestrin CE, Koller D, Parr R. Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research, 2003,19:399-468. 被引量:1
  • 6Guestrin CE, Koller D, Parr R. Multiagent planning with factored MDPs. In: Advances in Neural Information Processing Systems(NIPS-14). Vancouver, 2001. 1523-1530. http://robotics.stanford.edu/~koller/papers/nips01gkp.html 被引量:1
  • 7Guestrin CE, Koller D, Gearhart C, Kanodia N. Generalizing plans to new environments in relational MDPs. In: Int'l Joint Conf. on Artificial Intelligence (IJCAI 2003). Acapulco, 2003. 1003-1010. http://web.engr. oregonstate.edu/~hamann/generalizing_plans_rmdp.pdf 被引量:1
  • 8Sallans B. Reinforcement learning for factored Markov decision processes [Ph.D. Thesis]. Toronto: University of Toronto, 2002. 被引量:1
  • 9Maes S, Tuyls K, Manderick B. Reinforcement learning in large state spaces: Simulated robotic soccer as a testbed. Lecture Notes in Artificial Intelligence, RoboCup 2002. Fukuoka: Springer-Verlag, 2002. http://como.vub.ac.be:8080/Publications/uploads/1/rlrobo02.ps 被引量:1
  • 10Manderick TM. Q-Learning in simulated robotic soccer: Large state spaces and incomplete information. In: Proc. of the ICMLA2002. Las Vegas, 2002. 226-232. http:∥como.vub.ac.be:8080/Publications/uploads/1/icmla02.ps 被引量:1

共引文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部