期刊文献+

基于经验回放Q-Learning的最优控制算法 被引量:6

Optimal control based on experience replay and Q-Learning
下载PDF
导出
摘要 针对实时系统的在线最优控制策略学计算开销高的缺点,提出基于经验回放和Q-Learning的最优控制算法。采用经验回放(experience replay,ER)对样本进行重复利用,弥补实时系统在线获取样本少的不足;通过Q-Learning算法并采用梯度下降方法对值函数参数向量进行更新;定义基于经验回放和Q-Learning的ER-Q-Learning算法,分析其计算复杂度。仿真结果表明,相比Q-Learning算法、Sarsa算法以及批量的BLSPI算法,ER-Q-Learning算法能在有限时间内平衡更多时间步,具有最快的收敛速度。 Aiming at the problem of high computation cost in on-line optimal control strategy for real time system, an optimal control algorithm based on experience replay and Q-Learning was proposed. The experience replaying technique was adopted to reuse the samples, to solve the problem that real time system can not get enough samples. Through Q-Learning algorithm and gradient descent method, the parameter vector of value function was updated. The algorithm based on ER and Q-Learning was named ER-Q-Learning, and its computation cost was analyzed Results of simulation show compared with Q-Learning, Sarsa and BLSPI, ER-Q-Learning can balance more time steps than the three methods with higher convergence rate.
作者 黄小燕 HUANG Xiao-yan(Control Engineering School, Chengdu University of Information Technology, Chengdu 610225, China)
出处 《计算机工程与设计》 北大核心 2017年第5期1352-1355,1365,共5页 Computer Engineering and Design
基金 国家自然科学基金项目(61502329)
关键词 控制策略 经验回放 Q学习 实时系统 样本 control strategy experience replaying Q-Learning real-time system samples
  • 相关文献

参考文献5

二级参考文献57

共引文献41

同被引文献31

引证文献6

二级引证文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部