期刊文献+
共找到10篇文章
< 1 >
每页显示 20 50 100
Computational intelligence interception guidance law using online off-policy integral reinforcement learning
1
作者 WANG Qi LIAO Zhizhong 《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第4期1042-1052,共11页
Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-f... Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-form solu-tion due to the nonlinearity of HJI equation,and many iterative algorithms are proposed to solve the HJI equation.Simultane-ous policy updating algorithm(SPUA)is an effective algorithm for solving HJI equation,but it is an on-policy integral reinforce-ment learning(IRL).For online implementation of SPUA,the dis-turbance signals need to be adjustable,which is unrealistic.In this paper,an off-policy IRL algorithm based on SPUA is pro-posed without making use of any knowledge of the systems dynamics.Then,a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is pre-sented.Based on the online off-policy IRL method,a computa-tional intelligence interception guidance(CIIG)law is developed for intercepting high-maneuvering target.As a model-free method,intercepting targets can be achieved through measur-ing system data online.The effectiveness of the CIIG is verified through two missile and target engagement scenarios. 展开更多
关键词 two-person zero-sum differential games Hamilton–Jacobi–Isaacs(HJI)equation off-policy integral reinforcement learning(irl) online learning computational intelligence inter-ception guidance(CIIG)law
下载PDF
AInvR:Adaptive Learning Rewards for Knowledge Graph Reasoning Using Agent Trajectories 被引量:1
2
作者 Hao Zhang Guoming Lu +1 位作者 Ke Qin Kai Du 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2023年第6期1101-1114,共14页
Multi-hop reasoning for incomplete Knowledge Graphs(KGs)demonstrates excellent interpretability with decent performance.Reinforcement Learning(RL)based approaches formulate multi-hop reasoning as a typical sequential ... Multi-hop reasoning for incomplete Knowledge Graphs(KGs)demonstrates excellent interpretability with decent performance.Reinforcement Learning(RL)based approaches formulate multi-hop reasoning as a typical sequential decision problem.An intractable shortcoming of multi-hop reasoning with RL is that sparse reward signals make performance unstable.Current mainstream methods apply heuristic reward functions to counter this challenge.However,the inaccurate rewards caused by heuristic functions guide the agent to improper inference paths and unrelated object entities.To this end,we propose a novel adaptive Inverse Reinforcement Learning(IRL)framework for multi-hop reasoning,called AInvR.(1)To counter the missing and spurious paths,we replace the heuristic rule rewards with an adaptive rule reward learning mechanism based on agent’s inference trajectories;(2)to alleviate the impact of over-rewarded object entities misled by inaccurate reward shaping and rules,we propose an adaptive negative hit reward learning mechanism based on agent’s sampling strategy;(3)to further explore diverse paths and mitigate the influence of missing facts,we design a reward dropout mechanism to randomly mask and perturb reward parameters for the reward learning process.Experimental results on several benchmark knowledge graphs demonstrate that our method is more effective than existing multi-hop approaches. 展开更多
关键词 Knowledge Graph Reasoning(KGR) Inverse Reinforcement learning(irl) multi-hop reasoning
原文传递
Recognition and interfere deceptive behavior based on inverse reinforcement learning and game theory
3
作者 ZENG Yunxiu XU Kai 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第2期270-288,共19页
In real-time strategy(RTS)games,the ability of recognizing other players’goals is important for creating artifical intelligence(AI)players.However,most current goal recognition methods do not take the player’s decep... In real-time strategy(RTS)games,the ability of recognizing other players’goals is important for creating artifical intelligence(AI)players.However,most current goal recognition methods do not take the player’s deceptive behavior into account which often occurs in RTS game scenarios,resulting in poor recognition results.In order to solve this problem,this paper proposes goal recognition for deceptive agent,which is an extended goal recognition method applying the deductive reason method(from general to special)to model the deceptive agent’s behavioral strategy.First of all,the general deceptive behavior model is proposed to abstract features of deception,and then these features are applied to construct a behavior strategy that best matches the deceiver’s historical behavior data by the inverse reinforcement learning(IRL)method.Final,to interfere with the deceptive behavior implementation,we construct a game model to describe the confrontation scenario and the most effective interference measures. 展开更多
关键词 deceptive path planning inverse reinforcement learning(irl) game theory goal recognition
下载PDF
Modified reward function on abstract features in inverse reinforcement learning 被引量:1
4
作者 Shen-yi CHEN Hui QIAN Jia FAN Zhuo-jun JIN Miao-liang ZHU 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2010年第9期718-723,共6页
We improve inverse reinforcement learning(IRL) by applying dimension reduction methods to automatically extract Abstract features from human-demonstrated policies,to deal with the cases where features are either unkno... We improve inverse reinforcement learning(IRL) by applying dimension reduction methods to automatically extract Abstract features from human-demonstrated policies,to deal with the cases where features are either unknown or numerous.The importance rating of each abstract feature is incorporated into the reward function.Simulation is performed on a task of driving in a five-lane highway,where the controlled car has the largest fixed speed among all the cars.Performance is almost 10.6% better on average with than without importance ratings. 展开更多
关键词 Importance rating Abstract feature Feature extraction Inverse reinforcement learning(irl) Markov decision process(MDP)
原文传递
Adaptive Optimal Control of Space Tether System for Payload Capture via Policy Iteration 被引量:2
5
作者 FENG Yiting ZHANG Ming +1 位作者 GUO Wenhao WANG Changqing 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2021年第4期560-570,共11页
The libration control problem of space tether system(STS)for post-capture of payload is studied.The process of payload capture will cause tether swing and deviation from the nominal position,resulting in the failure o... The libration control problem of space tether system(STS)for post-capture of payload is studied.The process of payload capture will cause tether swing and deviation from the nominal position,resulting in the failure of capture mission.Due to unknown inertial parameters after capturing the payload,an adaptive optimal control based on policy iteration is developed to stabilize the uncertain dynamic system in the post-capture phase.By introducing integral reinforcement learning(IRL)scheme,the algebraic Riccati equation(ARE)can be online solved without known dynamics.To avoid computational burden from iteration equations,the online implementation of policy iteration algorithm is provided by the least-squares solution method.Finally,the effectiveness of the algorithm is validated by numerical simulations. 展开更多
关键词 space tether system(STS) payload capture policy iteration integral reinforcement learning(irl) state feedback
下载PDF
深度强化学习研究综述 被引量:49
6
作者 杨思明 单征 +1 位作者 丁煜 李刚伟 《计算机工程》 CAS CSCD 北大核心 2021年第12期19-29,共11页
深度强化学习是指利用深度神经网络的特征表示能力对强化学习的状态、动作、价值等函数进行拟合,以提升强化学习模型性能,广泛应用于电子游戏、机械控制、推荐系统、金融投资等领域。回顾深度强化学习方法的主要发展历程,根据当前研究... 深度强化学习是指利用深度神经网络的特征表示能力对强化学习的状态、动作、价值等函数进行拟合,以提升强化学习模型性能,广泛应用于电子游戏、机械控制、推荐系统、金融投资等领域。回顾深度强化学习方法的主要发展历程,根据当前研究目标对深度强化学习方法进行分类,分析与讨论高维状态动作空间任务上的算法收敛、复杂应用场景下的算法样本效率提高、奖励函数稀疏或无明确定义情况下的算法探索以及多任务场景下的算法泛化性能增强问题,总结与归纳4类深度强化学习方法的研究现状,同时针对深度强化学习技术的未来发展方向进行展望。 展开更多
关键词 深度学习 强化学习 深度强化学习 逆向强化学习 基于模型的元学习
下载PDF
逆强化学习算法、理论与应用研究综述
7
作者 宋莉 李大字 徐昕 《自动化学报》 EI CAS CSCD 北大核心 2024年第9期1704-1723,共20页
随着高维特征表示与逼近能力的提高,强化学习(Reinforcement learning,RL)在博弈与优化决策、智能驾驶等现实问题中的应用也取得显著进展.然而强化学习在智能体与环境的交互中存在人工设计奖励函数难的问题,因此研究者提出了逆强化学习(... 随着高维特征表示与逼近能力的提高,强化学习(Reinforcement learning,RL)在博弈与优化决策、智能驾驶等现实问题中的应用也取得显著进展.然而强化学习在智能体与环境的交互中存在人工设计奖励函数难的问题,因此研究者提出了逆强化学习(Inverse reinforcement learning,IRL)这一研究方向.如何从专家演示中学习奖励函数和进行策略优化是一个重要的研究课题,在人工智能领域具有十分重要的研究意义.本文综合介绍了逆强化学习算法的最新进展,首先介绍了逆强化学习在理论方面的新进展,然后分析了逆强化学习面临的挑战以及未来的发展趋势,最后讨论了逆强化学习的应用进展和应用前景. 展开更多
关键词 强化学习 逆强化学习 线性逆强化学习 深度逆强化学习 对抗逆强化学习
下载PDF
基于逆向强化学习的纵向自动驾驶决策方法 被引量:6
8
作者 高振海 闫相同 高菲 《汽车工程》 EI CSCD 北大核心 2022年第7期969-975,共7页
基于人类驾驶员数据获得自动驾驶决策策略是当前自动驾驶技术研究的热点。经典的强化学习决策方法大多通过设计安全性、舒适性、经济性相关公式人为构建奖励函数,决策策略与人类驾驶员相比仍然存在较大差距。本文中使用最大边际逆向强... 基于人类驾驶员数据获得自动驾驶决策策略是当前自动驾驶技术研究的热点。经典的强化学习决策方法大多通过设计安全性、舒适性、经济性相关公式人为构建奖励函数,决策策略与人类驾驶员相比仍然存在较大差距。本文中使用最大边际逆向强化学习算法,将驾驶员驾驶数据作为专家演示数据,建立相应的奖励函数,并实现仿驾驶员的纵向自动驾驶决策。仿真测试结果表明:相比于强化学习方法,逆向强化学习方法的奖励函数从驾驶员的数据中自动化的提取,降低了奖励函数的建立难度,得到的决策策略与驾驶员的行为具有更高的一致性。 展开更多
关键词 自动驾驶 决策算法 强化学习 逆向强化学习
下载PDF
逆向强化学习研究综述 被引量:1
9
作者 张立华 刘全 +1 位作者 黄志刚 朱斐 《软件学报》 EI CSCD 北大核心 2023年第10期4772-4803,共32页
逆向强化学习(inverse reinforcement learning,IRL)也称为逆向最优控制(inverse optimal control,IOC),是强化学习和模仿学习领域的一种重要研究方法,该方法通过专家样本求解奖赏函数,并根据所得奖赏函数求解最优策略,以达到模仿专家... 逆向强化学习(inverse reinforcement learning,IRL)也称为逆向最优控制(inverse optimal control,IOC),是强化学习和模仿学习领域的一种重要研究方法,该方法通过专家样本求解奖赏函数,并根据所得奖赏函数求解最优策略,以达到模仿专家策略的目的.近年来,逆向强化学习在模仿学习领域取得了丰富的研究成果,已广泛应用于汽车导航、路径推荐和机器人最优控制等问题中.首先介绍逆向强化学习理论基础,然后从奖赏函数构建方式出发,讨论分析基于线性奖赏函数和非线性奖赏函数的逆向强化学习算法,包括最大边际逆向强化学习算法、最大熵逆向强化学习算法、最大熵深度逆向强化学习算法和生成对抗模仿学习等.随后从逆向强化学习领域的前沿研究方向进行综述,比较和分析该领域代表性算法,包括状态动作信息不完全逆向强化学习、多智能体逆向强化学习、示范样本非最优逆向强化学习和指导逆向强化学习等.最后总结分析当前存在的关键问题,并从理论和应用方面探讨未来的发展方向. 展开更多
关键词 逆向强化学习 模仿学习 生成对抗模仿学习 逆向最优控制 强化学习
下载PDF
智能网联汽车基于逆强化学习的轨迹规划优化机制研究 被引量:2
10
作者 彭浩楠 唐明环 +2 位作者 查奇文 王聪 王伟达 《北京理工大学学报》 EI CAS CSCD 北大核心 2023年第8期820-831,共12页
针对当前轨迹规划策略存在实时性差、优化目标权重系数难以标定、模仿学习方法可解释性差等问题,提出了基于最大熵原则的逆强化学习方法,通过学习经验驾驶员驾驶轨迹的内在优化机制,从而规划出符合人类驾驶经验的整体最优的换道专家轨迹... 针对当前轨迹规划策略存在实时性差、优化目标权重系数难以标定、模仿学习方法可解释性差等问题,提出了基于最大熵原则的逆强化学习方法,通过学习经验驾驶员驾驶轨迹的内在优化机制,从而规划出符合人类驾驶经验的整体最优的换道专家轨迹,为解决轨迹规划方法的实时性问题和可解释性问题奠定了理论基础.以一般风险场景和高风险场景为应用案例,通过Matlab/Simulink仿真验证了所提逆强化学习方法实现轨迹规划的可行性与有效性. 展开更多
关键词 智能网联汽车 逆强化学习 轨迹规划 最大熵原则
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部