强化学习算法的稳定状态空间控制

Algorithm of stable state spaces in reinforcement learning

下载PDF

导出

摘要强化学习算法的探索次数随着状态空间的增加呈指数增长,因此难以用于复杂系统的控制中。为克服这一问题,提出一种稳定状态空间控制的强化学习算法。算法以寻找稳定空间的最优控制动作为学习目标,将探索过程集中于稳定状态空间中,而不探索系统的全部状态空间。由于稳定状态空间通常仅占系统状态空间中的极小一部分,因此算法的探索次数不随状态空间的增加呈指数增长。 Reinforcement leaning often suffers from the fact that the number of trials grows exponentially as the state spaces expand. This paper proposed an algorithm of stable state spaces in reinforcement learning to overcome this problem. The algorithm aimed for optimal actions in stable state spaces and focused exploration areas on stable state spaces instead of the whole state spaces, As stable state spaces is only a small fraction of the whole state spaces, the number of trials in our algorithm does not grow exponentially as the state spaces expand.

作者郑宇罗四维吕子昂

机构地区北京交通大学计算机与信息技术学院

出处《计算机应用》 CSCD 北大核心 2008年第5期1328-1330,1343,共4页 journal of Computer Applications

基金国家自然科学基金资助项目(60373029)

关键词强化学习马尔可夫决策过程稳定状态倒立摆 reinforcement learning Markov Decision Process （MDP） stable state inverted pendulum

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献10

1SUTTON R,BARTO A.Reinforcement learning:An introduction[M].Cambridge,MA:MIT Press,1998. 被引量：1
2MURAO H,KITAMURA S.Q-learning with adaptive state segmentation (QLASS)[C]// Proceedings of the 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation(CIRA'97).Washington,DC:IEEE Computer Society,1997:179-184. 被引量：1
3张双民,石纯一.一种基于特征向量提取的FMDP模型求解方法[J].软件学报,2005,16(5):733-743. 被引量：3
4MORALES E P.Relational state abstractions for reinforcement learning[C]// Proceedings of the 21st International Conference on Machine Learning(ICML 2004).New York,NY:ACM Press,2004:27-32. 被引量：1
5MURATA M,OZAWA S.A reinforcement learning algorithm for a class of dynamical environments using neural networks[C]// SICE 2003 Annual Conference.Washington,DC:IEEE Computer Society,2003:2004-2009. 被引量：1
6钱征,孙亮,阮晓钢.一种基于递归神经网络的自适应控制方法研究[J].微计算机信息,2005,21(11S):88-90. 被引量：3
7SEKINO M,KATAGAMI D,NITTA K.State spaces self organization based on the interaction between basis functions[C]// Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS 2005).Washington,DC:IEEE Computer Society,2005:2929-2934. 被引量：1
8ORMONEIT D,GLYNN P.Kernel-based reinforcement learning in average-cost problems[J].IEEE Transactions on Automatic Control,2002,47(10):1624-1636. 被引量：1
9RUDOLPH P.Adaptive control of human posture using reinforcement learning[D].Cleveland,OH:Cleveland state university,2003. 被引量：1
10MITCHELL T.Machine learning[M].Columbus,OH:McGraw-Hill,1997. 被引量：1

二级参考文献19

1Parr K. Policy iteration for factored MDPs. In: Proc. of the 16th Conf. on Uncertainty in Artificial Intelligence (UAI00). Stanford,2000. 326-334. http://ai.stanford.edu/～koller/papers/uai00kp.html 被引量：1
2Parr K. Computing factored value functions for policies in structured MDPs. In: Int'l Joint Conf. on Artificial Intelligence(IJCAI'99). Morgan Kaufmann Publishers, 1999.1332-1339. http://ai.stanford.edu/～koller/papers/ijcai99kp.html 被引量：1
3de Farias R. Approximate dynamic programming via linear programming. In: Advances in Neural Information Processing Systems14. Cambridge: MIT Press, 2002. http://www.core.org.cn/NR/rdonlyres/Mechanical-Engineering/2-997Spring2004/DF5542A5-BBCC-4BAB-ADBF-41AB0FDA6F95/0/most_uhan_slides.pdf 被引量：1
4Guestrin CE, Venkataraman S, Koller D. Context specific multiagent coordination and planning with factored MDPS. In:AAAI-2002 The 18th National Conf. on Artificial Intelligence. Edmonton, 2002. 253-259. http://www-2.cs.cmu.edu/～shobha/research/aaai02.pdf 被引量：1
5Guestrin CE, Koller D, Parr R. Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research, 2003,19:399-468. 被引量：1
6Guestrin CE, Koller D, Parr R. Multiagent planning with factored MDPs. In: Advances in Neural Information Processing Systems(NIPS-14). Vancouver, 2001. 1523-1530. http://robotics.stanford.edu/～koller/papers/nips01gkp.html 被引量：1
7Guestrin CE, Koller D, Gearhart C, Kanodia N. Generalizing plans to new environments in relational MDPs. In: Int'l Joint Conf. on Artificial Intelligence (IJCAI 2003). Acapulco, 2003. 1003-1010. http://web.engr. oregonstate.edu/～hamann/generalizing_plans_rmdp.pdf 被引量：1
8Sallans B. Reinforcement learning for factored Markov decision processes [Ph.D. Thesis]. Toronto: University of Toronto, 2002. 被引量：1
9Maes S, Tuyls K, Manderick B. Reinforcement learning in large state spaces: Simulated robotic soccer as a testbed. Lecture Notes in Artificial Intelligence, RoboCup 2002. Fukuoka: Springer-Verlag, 2002. http://como.vub.ac.be:8080/Publications/uploads/1/rlrobo02.ps 被引量：1
10Manderick TM. Q-Learning in simulated robotic soccer: Large state spaces and incomplete information. In: Proc. of the ICMLA2002. Las Vegas, 2002. 226-232. http:∥como.vub.ac.be:8080/Publications/uploads/1/icmla02.ps 被引量：1

共引文献4

1宋静,刘心松,赖周建,牟力.一种改进的2pc协议及其性能[J].微计算机信息,2006,22(04X):232-234. 被引量：6
2刘忠亮,王旭辉,白振兴.基于L-M BP算法对RoboCup仿真训练工具的设计[J].微计算机信息,2008,24(28):305-306.
3肖正,张世永.基于后悔值的多Agent冲突博弈强化学习模型[J].软件学报,2008,19(11):2957-2967. 被引量：6
4王卫玲,初建崇,任颖,张燕红.基于动态融合的三维模型特征选择算法[J].计算机与数字工程,2022,50(6):1259-1262.

1周琳.基于计算机视觉的物流仓储空间控制管理系统[J].物流技术,2014,33(2):326-329.
2Windows7磁盘配额“看人”给空间[J].电脑爱好者（普及版）,2011(A01):18-19.
3《空间控制技术与应用》选题大纲[J].空间控制技术与应用,2014,40(6).
4《空间控制技术与应用》选题大纲[J].空间控制技术与应用,2014,40(4).
5边炳秀.主编新春寄语[J].空间控制技术与应用,2009,35(1).
6“空间控制技术与应用”专题学术研讨会顺利召开[J].空间控制技术与应用,2008,34(6).
7杨志蓬.介绍几种美国现用的安全防范设施及其安装[J].刑事技术,1980,5(6):46-48.
8张晓龙,尹仕斌,任永杰,郭寅,杨凌辉,王一.基于全局空间控制的高精度柔性视觉测量系统研究[J].红外与激光工程,2015,44(9):2805-2812. 被引量：8
9张忆.一类基于时间的Jade结构模型研究[J].电脑知识与技术,2015,11(5X):208-209.
10李舒帆.基于PLC联网技术对空调的控制研究[J].数字技术与应用,2016,34(8):4-4.

计算机应用

2008年第5期

浏览历史

内容加载中请稍等...

强化学习算法的稳定状态空间控制

参考文献10

二级参考文献19

共引文献4

相关作者

相关机构

相关主题

浏览历史