Decision makers often face the need of performance guarantee with some sufficiently high proba-bility. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probabilitycriterion for ...Decision makers often face the need of performance guarantee with some sufficiently high proba-bility. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probabilitycriterion for the first achieving target value. The objective is to find a policy that maximizes the probabilityof the total discounted reward exceeding a target value in the preceding stages. We show that our formula-tion cannot be described by former models with standard criteria. We provide the properties of the objectivefunctions, optimal value functions and optimal policies. An algorithm for computing the optimal policies forthe finite horizon case is given. In this stochastic stopping model, we prove that there exists an optimal deter-ministic and stationary policy and the optimality equation has a unique solution. Using perturbation analysis,we approximate general models and prove the existence of ε-optimal policy for finite state space. We give anexample for the reliability of the satellite systems using the above theory. Finally, we extend these results tomore general cases.展开更多
This paper deals with Markov decision processes with a target set for nonpositive rewards. Two types of threshold probability criteria are discussed. The first criterion is a probability that a total reward is not gre...This paper deals with Markov decision processes with a target set for nonpositive rewards. Two types of threshold probability criteria are discussed. The first criterion is a probability that a total reward is not greater than a given initial threshold value, and the second is a probability that the total reward is less than it. Our first (resp. second) optimizing problem is to minimize the first (resp. second) threshold probability. These problems suggest that the threshold value is a permissible level of the total reward to reach a goal (the target set), that is, we would reach this set over the level, if possible. For the both problems, we show that 1) the optimal threshold probability is a unique solution to an optimality equation, 2) there exists an optimal deterministic stationary policy, and 3) a value iteration and a policy space iteration are given. In addition, we prove that the first (resp. second) optimal threshold probability is a monotone increasing and right (resp. left) continuous function of the initial threshold value and propose a method to obtain an optimal policy and the optimal threshold probability in the first problem by using them in the second problem.展开更多
基金We thank the referees for their valuable comments and suggestions.This work was supported by the National Natural Science Foundation of China(Grant No.19871046).
文摘Decision makers often face the need of performance guarantee with some sufficiently high proba-bility. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probabilitycriterion for the first achieving target value. The objective is to find a policy that maximizes the probabilityof the total discounted reward exceeding a target value in the preceding stages. We show that our formula-tion cannot be described by former models with standard criteria. We provide the properties of the objectivefunctions, optimal value functions and optimal policies. An algorithm for computing the optimal policies forthe finite horizon case is given. In this stochastic stopping model, we prove that there exists an optimal deter-ministic and stationary policy and the optimality equation has a unique solution. Using perturbation analysis,we approximate general models and prove the existence of ε-optimal policy for finite state space. We give anexample for the reliability of the satellite systems using the above theory. Finally, we extend these results tomore general cases.
文摘This paper deals with Markov decision processes with a target set for nonpositive rewards. Two types of threshold probability criteria are discussed. The first criterion is a probability that a total reward is not greater than a given initial threshold value, and the second is a probability that the total reward is less than it. Our first (resp. second) optimizing problem is to minimize the first (resp. second) threshold probability. These problems suggest that the threshold value is a permissible level of the total reward to reach a goal (the target set), that is, we would reach this set over the level, if possible. For the both problems, we show that 1) the optimal threshold probability is a unique solution to an optimality equation, 2) there exists an optimal deterministic stationary policy, and 3) a value iteration and a policy space iteration are given. In addition, we prove that the first (resp. second) optimal threshold probability is a monotone increasing and right (resp. left) continuous function of the initial threshold value and propose a method to obtain an optimal policy and the optimal threshold probability in the first problem by using them in the second problem.