摘要
运用基于性能势的M步向前(look-ahead)异步策略迭代算法研究了半Markov决策过程(SMDP)优化问题。首先给出了基于性能势理论求解的一种M步向前策略迭代算法。该算法不仅对标准策略迭代算法和一般的异步策略迭代算法都适用,而且对SMDP在折扣和平均准则下的优化也是统一的;另外给出了两种性能准则下基于即时差分学习的M步向前仿真策略迭代。最后通过一个数值算例比较了各种算法的特点。
The semi-Markov decision processes (SMDPs) were studied by the M-step look-ahead policy iteration(PI) based on the performance potentials. A M-step look-ahead PI algorithm based on the solution of performance potential theory was proposed. The algorithm can be used to the standard PI as well as the conventional asynchronous PI, and is also consistent with the SMDP optimization under both discounted and averaged criteria. The formulation for the M-step look-ahead PI based on TD learning under both performance criteria was given. The features of the above algorithm were demonstrated by a numerical example.
出处
《吉林大学学报(工学版)》
EI
CAS
CSCD
北大核心
2006年第6期958-962,共5页
Journal of Jilin University:Engineering and Technology Edition
基金
国家自然科学基金项目(60404009)
安徽省自然科学基金资助项目(050420303)
合肥工业大学中青年科技创新群体计划资助项目
关键词
计算机应用
半MARKOV决策过程
性能势
M步向前策略迭代
即时差分学习
computer application
semi-Markov decision process (SMDP)
performance potential
M-step look-ahead policy iteration
temporal difference(TD) learning