期刊文献+

基于优化子目标数的Option-Critic算法 被引量:3

Option-Critic Algorithm Based on Sub-Goal Quantity Optimization
下载PDF
导出
摘要 时间抽象是分层强化学习中的重要研究方向,而子目标是时间抽象形成的核心元素.目前,大部分分层强化学习需要人工给出子目标或设定子目标数量.然而,在很多情况下,这不仅需要大量的人工干预,而且所作设定未必适合对应场景,在动态环境未知的指导下,这一问题尤为突出.针对此,提出基于优化子目标数的Option-Critic算法(Option-Critic algorithm based on Sub-goal Quantity Optimization,OC-SQO),增加了智能体对环境的探索部分,通过与环境的简单交互,得到适用于应用场景的初始子目标数量估值,并在此基础上识别子目标,然后利用通过策略梯度生成对应的抽象,使用初态、内部策略和终止函数构成的三元组表示,以此进行训练,根据交互得到的抽象改变当前状态,不断迭代优化.OC-SQO算法可以在任意状态下开始执行,不要求预先指定子目标和参数,在执行过程中使用策略梯度生成内部策略、抽象间策略和终止函数,不需要提供内部奖赏信号,也无需获取子目标的情况,尽可能地减少了人工干预.实验验证了算法的有效性. Reinforcement learning has been extensively studied as a branch of machine learning,where an agent keeps interacting with the environment with the goal of getting maximal long-term return,making it prominent in areas such as control and optimal scheduling.Deep reinforcement learning(DRL) is designed to handle large-scale high-dimensional data such as video and image by extracting the abstract representation,and learning an optimall policy through reinforcement learning component.Deep reinforcement learning has become a research hotspot in artificial intelligence and a lot of algorithms have been developed.For example,deep Q Network(DQN)is one of the most famous models in deep reinforcement learning,which is based on convolutional neural network(CNN) and Q-learning algorithm and has been used to learn policy in complex environments with high dimensional inputs.However,the DQN failed to perform well in sparse reward environment or with large-scale state space.Hierarchical reinforcement learning was introduced to solve the aforementioned problems where the initial problem space is decomposed into several sub-problem spaces,and the initial large problem is solving by meaning of dealing with each sub-problem individually.However,hierarchical reinforcement learning tends to be effective in tasks with discrete state/action space.The idea of hierarchical deep reinforcement learning,by combining hierarchical reinforcement learning with deep learning,is similar to that of hierarchical reinforcement learning,where it solves sub-problems through the neural network.Time abstraction is an important concept in hierarchical reinforcement learning,and the sub-goal is the key for producing time abstraction.Time abstraction,as one of the most promising areas of hierarchical reinforcement learning,requires the notion of sub-goal as the prerequisite.At present,however,sub-goals or the number of sub-goals must be manually specified,which is in short of automation and generalization across different scenarios.To solve the problem,we pr
作者 刘成浩 朱斐 刘全 LIU Cheng-Hao;ZHU Fei;LIU Quan(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006;Provincial Key Laboratory for Computer Information Processing Technology(Soochow University),Suzhou,Jiangsu 215006)
出处 《计算机学报》 EI CAS CSCD 北大核心 2021年第9期1922-1933,共12页 Chinese Journal of Computers
基金 国家自然科学基金项目(61303108,61772355) 江苏省高校自然科学研究项目重大项目(17KJA520004) 苏州市重点产业技术创新-前瞻性应用研究项目(SYG201804) 江苏高校优势学科建设工程资助项目(PAPD)资助。
关键词 分层深度强化学习 时间抽象 子目标 强化学习 OPTION hierarchical deep reinforcement learning time abstraction sub-goal reinforcement learning Option
  • 相关文献

同被引文献12

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部