摘要
在文献[1]—[3]中在各自的条件下,讨论过非时齐折扣马氏决策模型及其ε(≥0)最优策略存在的条件.在文献[4],文献[5]中,在状态和行动集都是可数的条件下,讨论了具有绝对平均相对有界的无界报酬的时齐折扣马氏决策模型.本文在状态集仍为可数,行动集为任意的条件下,建立与[4]相应的非时齐的折扣马氏决策模型;给出模型的有限阶段逼近和建立最优方程;证明了ε(>0)
In this paper, a non-stationary discounted Markovian decision model is investigated underabsolute average relatively bounded reward functions. The optimality equations for the modelare established. The existence of an ε-optimal policy is proved. Necessary and sufficientconditions for the optimality of a policy are derived. It is shown that if there is an optimalpolicy, then there exists an optimal Markovian policy. We also discuss the optimality of aconvex combination of these optimal policies. Finally, some properties of these optimal policiesare shown.
出处
《应用数学学报》
CSCD
北大核心
1990年第3期314-323,共10页
Acta Mathematicae Applicatae Sinica