Based on the theory of Markov performance potentials and neuro-dynamic programming(NDP) methodology, we study simulation optimization algorithm for a class of continuous timeMarkov decision processes (CTMDPs) under ra...Based on the theory of Markov performance potentials and neuro-dynamic programming(NDP) methodology, we study simulation optimization algorithm for a class of continuous timeMarkov decision processes (CTMDPs) under randomized stationary policies. The proposed algo-rithm will estimate the gradient of average cost performance measure with respect to policy param-eters by transforming a continuous time Markov process into a uniform Markov chain and simula-ting a single sample path of the chain. The goal is to look for a suboptimal randomized stationarypolicy. The algorithm derived here can meet the needs of performance optimization of many diffi-cult systems with large-scale state space. Finally, a numerical example for a controlled Markovprocess is provided.展开更多
Continuous time Markov decision programming (shortly, CTMDP) with discount return criterion investigated in this note is {S,[(A(i), (i)), i∈S], q, r, α}. In this model the state set S is countable; the action set A(...Continuous time Markov decision programming (shortly, CTMDP) with discount return criterion investigated in this note is {S,[(A(i), (i)), i∈S], q, r, α}. In this model the state set S is countable; the action set A(i)is non-empty, (i)is a σ-algebra on A(i) which contains all single point sets of A(i); the family of the transition rate q(j|i, a)展开更多
This paper deals with the continuous time Markov decision programming (briefly CTMDP) withunbounded reward rate.The economic criterion is the long-run average reward. To the models withcountable state space,and compa... This paper deals with the continuous time Markov decision programming (briefly CTMDP) withunbounded reward rate.The economic criterion is the long-run average reward. To the models withcountable state space,and compact metric action sets,we present a set of sufficient conditions to ensurethe existence of the stationary optimal policies.展开更多
文摘Based on the theory of Markov performance potentials and neuro-dynamic programming(NDP) methodology, we study simulation optimization algorithm for a class of continuous timeMarkov decision processes (CTMDPs) under randomized stationary policies. The proposed algo-rithm will estimate the gradient of average cost performance measure with respect to policy param-eters by transforming a continuous time Markov process into a uniform Markov chain and simula-ting a single sample path of the chain. The goal is to look for a suboptimal randomized stationarypolicy. The algorithm derived here can meet the needs of performance optimization of many diffi-cult systems with large-scale state space. Finally, a numerical example for a controlled Markovprocess is provided.
文摘Continuous time Markov decision programming (shortly, CTMDP) with discount return criterion investigated in this note is {S,[(A(i), (i)), i∈S], q, r, α}. In this model the state set S is countable; the action set A(i)is non-empty, (i)is a σ-algebra on A(i) which contains all single point sets of A(i); the family of the transition rate q(j|i, a)
基金This paper was prepared with the support of the National Youth Science Foundation
文摘 This paper deals with the continuous time Markov decision programming (briefly CTMDP) withunbounded reward rate.The economic criterion is the long-run average reward. To the models withcountable state space,and compact metric action sets,we present a set of sufficient conditions to ensurethe existence of the stationary optimal policies.