In this paper,we study the robustness property of policy optimization(particularly Gauss-Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each...In this paper,we study the robustness property of policy optimization(particularly Gauss-Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each iteration.By invoking the concept of input-to-state stability and utilizing Lyapunov's direct method,it is shown that,if the noise is sufficiently small,the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration.Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided.Based on Willems'fundamental lemma,a learning-based policy iteration algorithm is proposed.The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal.The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration.Several numerical simulations are conducted to demonstrate the efficacy of the proposed method.展开更多
This paper develops a large-scale small-gain result for dynamic networks composed of infinite-dimensional subsystems. It is assumed that the subsystems are input-to-output stable(IOS)and unboundedness observable(UO...This paper develops a large-scale small-gain result for dynamic networks composed of infinite-dimensional subsystems. It is assumed that the subsystems are input-to-output stable(IOS)and unboundedness observable(UO), and the large-scale infinite-dimensional system can be proved to be IOS and UO if the proposed small-gain condition is satisfied.展开更多
针对电子节气门系统的状态变量不完全可测量,设计了一个基于观测器的输出反馈电子节气门控制系统.该系统由一个估计不可测量状态的降阶观测器和一个非线性状态反馈控制器组成.同时在控制器中引入了跟踪误差的积分项以抑制跟踪静差.将建...针对电子节气门系统的状态变量不完全可测量,设计了一个基于观测器的输出反馈电子节气门控制系统.该系统由一个估计不可测量状态的降阶观测器和一个非线性状态反馈控制器组成.同时在控制器中引入了跟踪误差的积分项以抑制跟踪静差.将建模误差和观测器误差等不确定性看作外部扰动,在输入到状态稳定性(Input to state stability,ISS)理论框架下分析了跟踪误差系统的鲁棒性,并据此给出了选择控制器参数的指导性原则.仿真及实验结果表明,基于观测器的输出反馈控制器能够很好地实现电子节气门的跟踪控制.展开更多
基金supported in part by the National Science Foundation(Nos.ECCS-2210320,CNS-2148304).
文摘In this paper,we study the robustness property of policy optimization(particularly Gauss-Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each iteration.By invoking the concept of input-to-state stability and utilizing Lyapunov's direct method,it is shown that,if the noise is sufficiently small,the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration.Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided.Based on Willems'fundamental lemma,a learning-based policy iteration algorithm is proposed.The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal.The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration.Several numerical simulations are conducted to demonstrate the efficacy of the proposed method.
基金supported by the National Science Foundation under Grant No.ECCS-1501044the National Natural Science Foundation under Grant Nos.61374042,61522305,61633007 and 61533007the State Key Laboratory of Intelligent Control and Decision of Complex Systems at BIT
文摘This paper develops a large-scale small-gain result for dynamic networks composed of infinite-dimensional subsystems. It is assumed that the subsystems are input-to-output stable(IOS)and unboundedness observable(UO), and the large-scale infinite-dimensional system can be proved to be IOS and UO if the proposed small-gain condition is satisfied.
基金Supported by National Natural Science Foundation of China (60872046) the Key Discipline Development Program of Beijing Municipal Commission (XK100080537)
基金supported by the High Technology Research and Development Program of Jilin(20130204021GX)the Specialized Research Fund for Graduate Course Identification System Program(Jilin University)of China(450060523183)+2 种基金the National Natural Science Foundation of China(61520106008,U1564207,61503149)the Education Department of Jilin Province of China(2016430)the Graduate Innovation Fund of Jilin University(2016030)
文摘针对电子节气门系统的状态变量不完全可测量,设计了一个基于观测器的输出反馈电子节气门控制系统.该系统由一个估计不可测量状态的降阶观测器和一个非线性状态反馈控制器组成.同时在控制器中引入了跟踪误差的积分项以抑制跟踪静差.将建模误差和观测器误差等不确定性看作外部扰动,在输入到状态稳定性(Input to state stability,ISS)理论框架下分析了跟踪误差系统的鲁棒性,并据此给出了选择控制器参数的指导性原则.仿真及实验结果表明,基于观测器的输出反馈控制器能够很好地实现电子节气门的跟踪控制.
基金Supported by National Natural Science Foundation of China(10571036)the Key Discipline Development Program of Beijing Municipal Commission (XK100080537)