摘要
针对实时系统的在线最优控制策略学计算开销高的缺点,提出基于经验回放和Q-Learning的最优控制算法。采用经验回放(experience replay,ER)对样本进行重复利用,弥补实时系统在线获取样本少的不足;通过Q-Learning算法并采用梯度下降方法对值函数参数向量进行更新;定义基于经验回放和Q-Learning的ER-Q-Learning算法,分析其计算复杂度。仿真结果表明,相比Q-Learning算法、Sarsa算法以及批量的BLSPI算法,ER-Q-Learning算法能在有限时间内平衡更多时间步,具有最快的收敛速度。
Aiming at the problem of high computation cost in on-line optimal control strategy for real time system, an optimal control algorithm based on experience replay and Q-Learning was proposed. The experience replaying technique was adopted to reuse the samples, to solve the problem that real time system can not get enough samples. Through Q-Learning algorithm and gradient descent method, the parameter vector of value function was updated. The algorithm based on ER and Q-Learning was named ER-Q-Learning, and its computation cost was analyzed Results of simulation show compared with Q-Learning, Sarsa and BLSPI, ER-Q-Learning can balance more time steps than the three methods with higher convergence rate.
作者
黄小燕
HUANG Xiao-yan(Control Engineering School, Chengdu University of Information Technology, Chengdu 610225, China)
出处
《计算机工程与设计》
北大核心
2017年第5期1352-1355,1365,共5页
Computer Engineering and Design
基金
国家自然科学基金项目(61502329)
关键词
控制策略
经验回放
Q学习
实时系统
样本
control strategy
experience replaying
Q-Learning
real-time system
samples