期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Q学习算法中网格离散化方法的收敛性分析 被引量:9
1
作者 蒋国飞 高慧琪 吴沧浦 《控制理论与应用》 EI CAS CSCD 北大核心 1999年第2期194-198,共5页
Q学习算法是Watkins[1] 提出的求解信息不完全马尔可夫决策问题的一种强化学习方法 .要用Q学习算法来求解有连续状态和决策空间的随机最优控制问题 ,则需要先离散化问题的状态和决策空间 .在本文中 ,我们证明了在满足一定的Lipschitz连... Q学习算法是Watkins[1] 提出的求解信息不完全马尔可夫决策问题的一种强化学习方法 .要用Q学习算法来求解有连续状态和决策空间的随机最优控制问题 ,则需要先离散化问题的状态和决策空间 .在本文中 ,我们证明了在满足一定的Lipschitz连续性和有关集合为紧集的条件下 ,随着网格密度的增加 ,空间离散化后Q学习算法求得的最优解依概率 展开更多
关键词 Q学习算法 网格离散化 收敛性 马尔可夫决策
下载PDF
Feature-Based Aggregation and Deep Reinforcement Learning:A Survey and Some New Implementations 被引量:14
2
作者 Dimitri P.Bertsekas 《IEEE/CAA Journal of Automatica Sinica》 EI CSCD 2019年第1期1-31,共31页
In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinfor... In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement. 展开更多
关键词 REINFORCEMENT learning dynamic programming markovian decision problems AGGREGATION feature-based ARCHITECTURES policy ITERATION DEEP neural networks rollout algorithms
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部