摘要
讨论了向量值离散时间平均准则下的有限马氏决策模型;在采取确定性平稳策略时所得马氏决策过程为遍历的假设下,证明了存在一个至多在K-1个状态是随机的平稳最优策略,并给出了其线性规划算法。同时证明了存在强最优策略的充要条件是其存在强确定性平稳最优策略。
The vactor value Markov decision model is considered.It is assu med that the state andactionapaces are finite and the law of motion is unchain.i.e.every pure policy gives rise to a Merkov chainwith one recurrent class.It is proved that therc exists an optirnal stationary policy with a degree of ran-domization no more than K,A linear program pred1icing the optimal policy is presented.
出处
《西北师范大学学报(自然科学版)》
CAS
1994年第3期16-19,共4页
Journal of Northwest Normal University(Natural Science)
基金
甘肃省教委自然科学基金
关键词
向量值
平均准则
马氏决策过程
finite Markov decision model.optimal policy,vactor value,average criterion