期刊文献+

SMDP基于性能势的M步向前策略迭代

M-step look-ahead policy iteration for semi-Markov decision processes based on performance potentials
下载PDF
导出
摘要 运用基于性能势的M步向前(look-ahead)异步策略迭代算法研究了半Markov决策过程(SMDP)优化问题。首先给出了基于性能势理论求解的一种M步向前策略迭代算法。该算法不仅对标准策略迭代算法和一般的异步策略迭代算法都适用,而且对SMDP在折扣和平均准则下的优化也是统一的;另外给出了两种性能准则下基于即时差分学习的M步向前仿真策略迭代。最后通过一个数值算例比较了各种算法的特点。 The semi-Markov decision processes (SMDPs) were studied by the M-step look-ahead policy iteration(PI) based on the performance potentials. A M-step look-ahead PI algorithm based on the solution of performance potential theory was proposed. The algorithm can be used to the standard PI as well as the conventional asynchronous PI, and is also consistent with the SMDP optimization under both discounted and averaged criteria. The formulation for the M-step look-ahead PI based on TD learning under both performance criteria was given. The features of the above algorithm were demonstrated by a numerical example.
出处 《吉林大学学报(工学版)》 EI CAS CSCD 北大核心 2006年第6期958-962,共5页 Journal of Jilin University:Engineering and Technology Edition
基金 国家自然科学基金项目(60404009) 安徽省自然科学基金资助项目(050420303) 合肥工业大学中青年科技创新群体计划资助项目
关键词 计算机应用 半MARKOV决策过程 性能势 M步向前策略迭代 即时差分学习 computer application semi-Markov decision process (SMDP) performance potential M-step look-ahead policy iteration temporal difference(TD) learning
  • 相关文献

参考文献5

二级参考文献6

共引文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部