This paper will present an approximate/adaptive dynamic programming(ADP) algorithm,that uses the idea of integral reinforcement learning(IRL),to determine online the Nash equilibrium solution for the two-player zerosu...This paper will present an approximate/adaptive dynamic programming(ADP) algorithm,that uses the idea of integral reinforcement learning(IRL),to determine online the Nash equilibrium solution for the two-player zerosum differential game with linear dynamics and infinite horizon quadratic cost.The algorithm is built around an iterative method that has been developed in the control engineering community for solving the continuous-time game algebraic Riccati equation(CT-GARE),which underlies the game problem.We here show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics.The feasibility of the ADP scheme is demonstrated in simulation for a power system control application.The adaptation goal is the best control policy that will face in an optimal manner the highest load disturbance.展开更多
We consider a finite horizon,zero-sum linear quadratic differential game.The feature of this game is that a weight matrix of the minimiser’s control cost in the cost functional is singular.Due to this singularity,the...We consider a finite horizon,zero-sum linear quadratic differential game.The feature of this game is that a weight matrix of the minimiser’s control cost in the cost functional is singular.Due to this singularity,the game can be solved neither by applying the Isaacs MinMax principle nor using the Bellman–Isaacs equation approach,i.e.this game is singular.Aprevious paper of one of the authors analysed such a game in the case where the cost functional does not contain the minimiser’s control cost at all,i.e.the weight matrix of this cost equals zero.In this case,all coordinates of the minimiser’s control are singular.In the present paper,we study the general case where the weight matrix of the minimiser’s control cost,being singular,is not,in general,zero.This means that only a part of the coordinates of the minimiser’s control is singular,while others are regular.The considered game is treated by a regularisation,i.e.by its approximate conversion to an auxiliary regular game.The latter has the same equation of dynamics and a similar cost functional augmented by an integral of the squares of the singular control coordinates with a small positive weight.Thus,the auxiliary game is a partial cheap control differential game.Based on a singular perturbation’s asymptotic analysis of this auxiliary game,the existence of the value of the original(singular)game is established,and its expression is obtained.The maximiser’s optimal state feedback strategy and the minimising control sequence in the original game are designed.It is shown that the coordinates of the minimising control sequence,corresponding to the regular coordinates of the minimiser’s control,are point-wise convergent in the class of regular functions.The optimal trajectory sequence and the optimal trajectory in the considered singular game also are obtained.An illustrative example is presented.展开更多
This paper studies the policy iteration algorithm(PIA)for zero-sum stochastic differential games with the basic long-run average criterion,as well as with its more selective version,the so-called bias criterion.The sy...This paper studies the policy iteration algorithm(PIA)for zero-sum stochastic differential games with the basic long-run average criterion,as well as with its more selective version,the so-called bias criterion.The system is assumed to be a nondegenerate diffusion.We use Lyapunov-like stability conditions that ensure the existence and boundedness of the solution to certain Poisson equation.We also ensure the convergence of a sequence of such solutions,of the corresponding sequence of policies,and,ultimately,of the PIA.展开更多
基金supported by the National Science Foundation (No.ECCS-0801330)the Army Research Office (No.W91NF-05-1-0314)
文摘This paper will present an approximate/adaptive dynamic programming(ADP) algorithm,that uses the idea of integral reinforcement learning(IRL),to determine online the Nash equilibrium solution for the two-player zerosum differential game with linear dynamics and infinite horizon quadratic cost.The algorithm is built around an iterative method that has been developed in the control engineering community for solving the continuous-time game algebraic Riccati equation(CT-GARE),which underlies the game problem.We here show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics.The feasibility of the ADP scheme is demonstrated in simulation for a power system control application.The adaptation goal is the best control policy that will face in an optimal manner the highest load disturbance.
文摘We consider a finite horizon,zero-sum linear quadratic differential game.The feature of this game is that a weight matrix of the minimiser’s control cost in the cost functional is singular.Due to this singularity,the game can be solved neither by applying the Isaacs MinMax principle nor using the Bellman–Isaacs equation approach,i.e.this game is singular.Aprevious paper of one of the authors analysed such a game in the case where the cost functional does not contain the minimiser’s control cost at all,i.e.the weight matrix of this cost equals zero.In this case,all coordinates of the minimiser’s control are singular.In the present paper,we study the general case where the weight matrix of the minimiser’s control cost,being singular,is not,in general,zero.This means that only a part of the coordinates of the minimiser’s control is singular,while others are regular.The considered game is treated by a regularisation,i.e.by its approximate conversion to an auxiliary regular game.The latter has the same equation of dynamics and a similar cost functional augmented by an integral of the squares of the singular control coordinates with a small positive weight.Thus,the auxiliary game is a partial cheap control differential game.Based on a singular perturbation’s asymptotic analysis of this auxiliary game,the existence of the value of the original(singular)game is established,and its expression is obtained.The maximiser’s optimal state feedback strategy and the minimising control sequence in the original game are designed.It is shown that the coordinates of the minimising control sequence,corresponding to the regular coordinates of the minimiser’s control,are point-wise convergent in the class of regular functions.The optimal trajectory sequence and the optimal trajectory in the considered singular game also are obtained.An illustrative example is presented.
文摘This paper studies the policy iteration algorithm(PIA)for zero-sum stochastic differential games with the basic long-run average criterion,as well as with its more selective version,the so-called bias criterion.The system is assumed to be a nondegenerate diffusion.We use Lyapunov-like stability conditions that ensure the existence and boundedness of the solution to certain Poisson equation.We also ensure the convergence of a sequence of such solutions,of the corresponding sequence of policies,and,ultimately,of the PIA.