期刊文献+

Experience Replay for Least-Squares Policy Iteration 被引量:1

下载PDF
导出
摘要 Policy iteration,which evaluates and improves the control policy iteratively,is a reinforcement learning method.Policy evaluation with the least-squares method can draw more useful information from the empirical data and therefore improve the data validity.However,most existing online least-squares policy iteration methods only use each sample just once,resulting in the low utilization rate.With the goal of improving the utilization efficiency,we propose an experience replay for least-squares policy iteration(ERLSPI)and prove its convergence.ERLSPI method combines online least-squares policy iteration method with experience replay,stores the samples which are generated online,and reuses these samples with least-squares method to update the control policy.We apply the ERLSPI method for the inverted pendulum system,a typical benchmark testing.The experimental results show that the method can effectively take advantage of the previous experience and knowledge,improve the empirical utilization efficiency,and accelerate the convergence speed.
出处 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI 2014年第3期274-281,共8页 自动化学报(英文版)
  • 相关文献

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部