摘要
本文综述半Markov决策过程(SMDP)理论的发展现状,主要介绍SMDP无限阶段期望折扣报酬准则、长期平均准则、有限阶段期望报酬准则、首达目标期望报酬准则、概率准则、受约束问题和均值-方差准则的研究工作,着重阐述这些优化准则的背景、意义、主要研究进展及有待研究的问题.最后,展望SMDP未来的一些潜在研究方向和相关问题.
This paper is a survey on semi-Markov decision processes (SMDPs). We present the background, the significance, and the research actuality of the infinite horizon expected discounted reward criterion, the long-run expected average reward criterion, the finite horizon expected reward criterion, the expected first passage reward criterion, the probability criterion, constrained problems, and mean-variance problems in SMDPs. At the same time, some issues to be studied in the future for these criteria or problems are pointed out. We also discuss potential research directions for SMDPs.
出处
《中国科学:数学》
CSCD
北大核心
2015年第5期477-496,共20页
Scientia Sinica:Mathematica
基金
国家自然科学基金(批准号:11471341和61374067)资助项目