摘要
强化学习算法的探索次数随着状态空间的增加呈指数增长,因此难以用于复杂系统的控制中。为克服这一问题,提出一种稳定状态空间控制的强化学习算法。算法以寻找稳定空间的最优控制动作为学习目标,将探索过程集中于稳定状态空间中,而不探索系统的全部状态空间。由于稳定状态空间通常仅占系统状态空间中的极小一部分,因此算法的探索次数不随状态空间的增加呈指数增长。
Reinforcement leaning often suffers from the fact that the number of trials grows exponentially as the state spaces expand. This paper proposed an algorithm of stable state spaces in reinforcement learning to overcome this problem. The algorithm aimed for optimal actions in stable state spaces and focused exploration areas on stable state spaces instead of the whole state spaces, As stable state spaces is only a small fraction of the whole state spaces, the number of trials in our algorithm does not grow exponentially as the state spaces expand.
出处
《计算机应用》
CSCD
北大核心
2008年第5期1328-1330,1343,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(60373029)
关键词
强化学习
马尔可夫决策过程
稳定状态
倒立摆
reinforcement learning
Markov Decision Process (MDP)
stable state
inverted pendulum