期刊文献+
共找到11篇文章
< 1 >
每页显示 20 50 100
An Online Q-Learning Method for Linear-Quadratic Nonzero-Sum Stochastic Differential Games with Completely Unknown Dynamics
1
作者 ZHANG Bao-Qiang WANG Bing-Chang CAO Ying 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2024年第5期1907-1922,共16页
In this paper,the authors design a reinforcement learning algorithm to solve the adaptive linear-quadratic stochastic n-players non-zero sum differential game with completely unknown dynamics.For each player,a critic ... In this paper,the authors design a reinforcement learning algorithm to solve the adaptive linear-quadratic stochastic n-players non-zero sum differential game with completely unknown dynamics.For each player,a critic network is used to estimate the Q-function,and an actor network is used to estimate the control input.A model-free online Q-learning algorithm is obtained for solving this kind of problems.It is proved that under some mild conditions the system state and weight estimation errors can be uniformly ultimately bounded.A simulation with five players is given to verify the effectiveness of the algorithm. 展开更多
关键词 Actor-critic algorithm model-free adaptive control nonzero-sum stochastic game reinforcement learning
原文传递
Behavior Prediction of Untrusted Relays Based on Nonzero-Sum Game
2
作者 付晓梅 吴晓 汪清 《Transactions of Tianjin University》 EI CAS 2015年第4期371-376,共6页
To keep the secrecy performance from being badly influenced by untrusted relay(UR), a multi-UR network through amplify-and-forward(AF) cooperative scheme is put forward, which takes relay weight and harmful factor int... To keep the secrecy performance from being badly influenced by untrusted relay(UR), a multi-UR network through amplify-and-forward(AF) cooperative scheme is put forward, which takes relay weight and harmful factor into account. A nonzero-sum game is established to capture the interaction among URs and detection strategies. Secrecy capacity is investigated as game payoff to indicate the untrusted behaviors of the relays. The maximum probabilities of the behaviors of relay and the optimal system detection strategy can be obtained by using the proposed algorithm. 展开更多
关键词 physical layer security COOPERATIVE communication untrusted RELAY SECRECY capacity nonzero-sum game
下载PDF
基于均衡理念的流域污染物排放许可交易 被引量:2
3
作者 付意成 阮本清 臧文斌 《重庆大学学报(自然科学版)》 EI CAS CSCD 北大核心 2012年第9期114-120,共7页
污染物排放许可交易是一种实现流域污染物治理与水质改善均衡发展的有效经济手段。在概述国内外流域污染物排放交易研究特点的基础上,给出以治污成本最小化、低水位水质风险最小化为目标函数的污染物排放交易研究框架。在综合遗传算法... 污染物排放许可交易是一种实现流域污染物治理与水质改善均衡发展的有效经济手段。在概述国内外流域污染物排放交易研究特点的基础上,给出以治污成本最小化、低水位水质风险最小化为目标函数的污染物排放交易研究框架。在综合遗传算法改进序列(NSGA-Ⅱ)、Young交易理论(YBT)、污染物初始排放许可分配(IDPA)模型求解适用范围的基础上,构建涵盖流域治污层面相关要素的污染物排放许可交易框架。利用非零和博弈模型,以前述理论框架为基础,构建以流域均衡发展、治污成本最小化为目的的污染物排放交易模型。以典型流域为例,给出各方能够接受的最优均衡化结果,验证了模型的适用性。结合应用中出现的问题,给出模型完善建议和使用前景预测。 展开更多
关键词 均衡 流域污染物排放许可交易 多目标优化模型 非零和博弈
下载PDF
Nonzero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates 被引量:2
4
作者 ZHANG WenZhao GUO XianPing 《Science China Mathematics》 SCIE 2012年第11期2405-2416,共12页
This paper attempts to study two-person nonzero-sum games for denumerable continuous-time Markov chains determined by transition rates,with an expected average criterion.The transition rates are allowed to be unbounde... This paper attempts to study two-person nonzero-sum games for denumerable continuous-time Markov chains determined by transition rates,with an expected average criterion.The transition rates are allowed to be unbounded,and the payoff functions may be unbounded from above and from below.We give suitable conditions under which the existence of a Nash equilibrium is ensured.More precisely,using the socalled "vanishing discount" approach,a Nash equilibrium for the average criterion is obtained as a limit point of a sequence of equilibrium strategies for the discounted criterion as the discount factors tend to zero.Our results are illustrated with a birth-and-death game. 展开更多
关键词 nonzero-sum game expected average criterion Nash equilibrium unbounded transition rates unbounded payoff function
原文传递
Optimal synchronization control formulti-agent systems with input saturation:a nonzero-sum game 被引量:1
5
作者 Hongyang LI Qinglai WEI 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2022年第7期1010-1019,共10页
This paper presents a novel optimal synchronization control method for multi-agent systems with input saturation.The multi-agent game theory is introduced to transform the optimal synchronization control problem into ... This paper presents a novel optimal synchronization control method for multi-agent systems with input saturation.The multi-agent game theory is introduced to transform the optimal synchronization control problem into a multi-agent nonzero-sum game.Then,the Nash equilibrium can be achieved by solving the coupled Hamilton–Jacobi–Bellman(HJB)equations with nonquadratic input energy terms.A novel off-policy reinforcement learning method is presented to obtain the Nash equilibrium solution without the system models,and the critic neural networks(NNs)and actor NNs are introduced to implement the presented method.Theoretical analysis is provided,which shows that the iterative control laws converge to the Nash equilibrium.Simulation results show the good performance of the presented method. 展开更多
关键词 Optimal synchronization control Multi-agent systems nonzero-sum game Adaptive dynamic programming Input saturation Off-policy reinforcement learning Policy iteration
原文传递
Three-dimensional nonlinear H_2/H_∞ guidance law based upon approach of solving the state feedback Nash balance point
6
作者 桑保华 姜长生 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2010年第3期383-388,共6页
Based upon the theory of the nonlinear quadric two-person nonzero-sum differential game,the fact that the time-limited mixed H2/H∞ control problem can be turned into the problem of solving the state feedback Nash bal... Based upon the theory of the nonlinear quadric two-person nonzero-sum differential game,the fact that the time-limited mixed H2/H∞ control problem can be turned into the problem of solving the state feedback Nash balance point is mentioned. Upon this,a theorem about the solution of the state feedback control is given,the Lyapunov stabilization of the nonlinear system under this control is proved,too. At the same time,this solution is used to design the nonlinear H2/H∞ guidance law of the relative motion between the missile and the target in three-dimensional(3D) space. By solving two coupled Hamilton-Jacobi partial differential inequalities(HJPDI),a control with more robust stabilities and more robust performances is obtained. With different H∞ performance indexes,the correlative weighting factors of the control are analytically designed. At last,simulations under different robust performance indexes and under different initial conditions and under the cases of intercepting different maneuvering targets are carried out. All results indicate that the designed law is valid. 展开更多
关键词 nonlinear system mixed H2/H∞ control state feedback Nash balance point two-person nonzero-sum differential game three-dimensional guidance law
下载PDF
BACKWARD LINEAR-QUADRATIC STOCHASTIC OPTIMAL CONTROL AND NONZERO-SUM DIFFERENTIAL GAME PROBLEM WITH RANDOM JUMPS
7
作者 Detao ZHANG 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2011年第4期647-662,共16页
This paper studies the existence and uniqueness of solutions of fully coupled forward-backward stochastic differential equations with Brownian motion and random jumps.The result is applied to solve a linear-quadratic ... This paper studies the existence and uniqueness of solutions of fully coupled forward-backward stochastic differential equations with Brownian motion and random jumps.The result is applied to solve a linear-quadratic optimal control and a nonzero-sum differential game of backward stochastic differential equations.The optimal control and Nash equilibrium point are explicitly derived. Also the solvability of a kind Riccati equations is discussed.All these results develop those of Lim, Zhou(2001) and Yu,Ji(2008). 展开更多
关键词 Backward stochastic differential equations nonzero-sum differential game optimal con-trol poisson processes Riccati equation.
原文传递
带噪声记忆的非零和随机微分博弈问题的充分最大值原理
8
作者 张峰 梁嘉玮 《山东大学学报(理学版)》 CAS CSCD 北大核心 2024年第10期46-52,共7页
研究一类非零和随机微分博弈问题,其主要特点是状态变量和控制变量可以带有多种形式的时滞。状态变量可以带有分布时滞、离散时滞与噪声记忆,控制变量可以带有分布时滞与离散时滞。控制域为凸集。利用最大值原理方法建立该博弈问题的均... 研究一类非零和随机微分博弈问题,其主要特点是状态变量和控制变量可以带有多种形式的时滞。状态变量可以带有分布时滞、离散时滞与噪声记忆,控制变量可以带有分布时滞与离散时滞。控制域为凸集。利用最大值原理方法建立该博弈问题的均衡点所满足的充分条件。最后研究一个例子,给出均衡点的显式表达式。 展开更多
关键词 非零和随机微分博弈 时滞 噪声记忆 均衡点 最大值原理
原文传递
部分可观测带跳倒向随机系统的非零和微分博弈及其应用
9
作者 陈晓兰 王凯凯 朱庆峰 《工程数学学报》 CSCD 北大核心 2023年第5期738-750,共13页
微分博弈是研究两个或多个局中人的控制作用同时施加于一个由微分方程描述的动态系统时实现各自最优目标的博弈过程的理论,因其有趣的数学性质和经济领域的应用价值得到了广泛的关注。研究了一类部分可观测带跳倒向随机系统的非零和微... 微分博弈是研究两个或多个局中人的控制作用同时施加于一个由微分方程描述的动态系统时实现各自最优目标的博弈过程的理论,因其有趣的数学性质和经济领域的应用价值得到了广泛的关注。研究了一类部分可观测带跳倒向随机系统的非零和微分博弈问题,其中博弈系统涉及跳过程,且每个参与者拥有不同的观测方程。对于这种部分可观测的随机微分博弈问题,在控制域为凸的条件下,采用凸变分和对偶技术,建立了博弈纳什均衡点的最大值原理;在适当的凹凸性假设下,证明了必要性最优条件也是充分性最优条件,得到了验证定理。应用上述最大值原理,研究了部分可观测带跳倒向随机系统的线性二次(Linear Quadratic,LQ)博弈问题,得到了LQ博弈问题的唯一最优控制,其中状态方程和伴随方程构成了一类带跳的正倒向随机微分方程。由于LQ模型通常被用于描述许多金融和经济现象,期望上述的部分可观测带跳倒向随机系统的LQ博弈结果能在这些领域得到广泛应用。 展开更多
关键词 倒向随机微分方程 泊松过程 非零和随机微分博弈 最大值原理 纳什均衡点
下载PDF
带有无界赔付函数的非零和随机对策折扣模型
10
作者 杨洁 郭先平 《中山大学学报(自然科学版)》 CAS CSCD 北大核心 2008年第5期23-27,36,共6页
讨论了赔付函数可能既无上界又无下界的离散时间可数状态非零和随机对策的折扣模型。在零和随机对策中常用的"漂移"和"连续-紧"性条件下,用Fan's不动点定理证明了Nash平衡点的存在性。
关键词 非零和随机对策 期望折扣赔付准则 NASH平衡点 可数状态空间
下载PDF
部分可观测信息下的线性二次非零和随机微分对策 被引量:3
11
作者 王光臣 《山东大学学报(理学版)》 CAS CSCD 北大核心 2007年第6期12-15,共4页
结合正倒向随机微分方程理论和滤波技术,给出了一类部分可观测信息下线性二次非零和随机微分对策问题的纳什均衡点.
关键词 正倒向随机微分方程 非零和微分对策 纳什均衡点 滤波
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部