摘要
随机线性二次问题是一类重要且研究较为成熟的随机控制问题。其中,部分信息条件下的随机线性二次问题是指系统的状态方程或代价函数中存在未知系数的情形,该文在前人工作的基础上,改进部分信息条件下线性二次问题的最优控制在线强化学习算法。所研究系统方程和代价函数的系数都存在未知量,在此条件下,算法通过可观察的样本轨迹和回报函数求得最优控制以及代价函数中的未知系数,进一步地,我们给出迭代过程收敛性与控制稳定性的证明。
Random linear quadratic problems are important and mature stochastic control problems.Among them,the stochastic linear quadratic problem under partial information conditions refers to the situation where there are unknown coefficients in the state equation or cost function of the system.Based on previous work,this paper improves the optimal control online reinforcement learning algorithm for linear quadratic problems under partial information conditions.The coefficients of the studied system equations and cost function have unknown quantities.In this condition,the algorithm obtains the optimal control and the unknown coefficients in the cost function through the observable sample trajectory and the reward function.At the same time,the convergence and stability of the iterative process are proved.
出处
《科技创新与应用》
2024年第32期142-145,共4页
Technology Innovation and Application
关键词
随机线性二次问题
部分信息
李雅普诺夫方程
强化学习
动态规划原理
random linear quadratic problem
partial information
Lyapunov equation
reinforcement learning
dynamic programming principle