一种结合演示数据和演化优化的强化学习方法

Reinforcement learning method via combining demonstration data and evolutionary optimization

下载PDF

导出

摘要强化学习研究智能体如何从与环境的交互中学习最优的策略,以最大化长期奖赏。由于环境反馈的滞后性,强化学习问题面临巨大的决策空间,进行有效的搜索是获得成功学习的关键。以往的研究从多个角度对策略的搜索进行了探索,在搜索算法方面,研究结果表明基于演化优化的直接策略搜索方法能够获得优于传统方法的性能;在引入外部信息方面,通过加入用户提供的演示,可以有效帮助强化学习提高性能。然而,这两种有效方法的结合却鲜有研究。对用户演示与演化优化的结合进行研究,提出iNEAT+Q算法,尝试将演示数据通过预训练神经网络和引导演化优化的适应值函数的方式与演化强化学习方法结合。初步实验表明,iNEAT+Q较不使用演示数据的演化强化学习方法NEAT+Q有明显的性能改善。 Reinforcement learning aims at learning an optimal policy that maximizes the long term rewards, from interactions with the environment. Since the environment feedbacks commonly delay after a sequences of actions, reinforcement learning has to tackle the problem of searching in a huge policy space, and thus an effective search is the key to a success approach. Previous studies explore various ways to achieve effective search methods, one effective way is employing the evolutionary algorithm as the search method, and another direction is introducing user demonstration data to guide the search. In this work, it investigates the combination of the two directions, and proposes the iNEAT＋Q approach, which trains a neural network using the demonstration data as well as integrating the demonstration data into the fitness function for the evolutionary algorithm. Preliminary empirical study shows that iNEAT＋Q is superior to NEAT＋Q, which is an classical evolutionary reinforcement learning approach.

作者宋拴俞扬

机构地区南京大学计算机软件新技术国家重点实验室

出处《计算机工程与应用》 CSCD 2014年第11期115-119,129,共6页 Computer Engineering and Applications

基金江苏省自然科学基金青年项目(No.BK2012303)

关键词强化学习演化算法从演示中学习神经网络 reinforcement learning evolutionary algorithm learning from demonstrations neural network

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献15

1Sutton R S,Barto A G.Reinforcement learning:an intro- duction[M].Cambridge:MIT Press, 1998. 被引量：1
2Marco W,Martijn V O.Reinforcement learning:state of the art[M].New York: Springer-Verlag, 2012. 被引量：1
3高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报,2004,30(1):86-100. 被引量：263
4王瑞霞,孙亮,阮晓钢.基于强化学习的二级倒立摆控制[J].计算机仿真,2006,23(4):305-308. 被引量：3
5Moriarty D E, Schultz A C, Grefenstette J J.Evolution- ary algorithms for reinforcement learning[J].Journal of Artificial Intelligence Research, 1999,11:241-276. 被引量：1
6Whiteson S, Stone P.Evolutionary function approxima- tion for reinforcement learning[J].Journal of Machine Learning Research, 2006,7 : 877-917. 被引量：1
7Stanley K O, Miikkulainen R.Evolving neural networks through augmenting topologies[J].Evolutionary Computa- tion, 2002,10(2) :99-127. 被引量：1
8Stanley K O, Miikkulainen R.Efficient reinforcement learn- ing through evolving neural network topologies[C]//Pro- ceedings of the Genetic and Evolutionary Computation Conference (GECCO-2002) , San Francisco, CA, 2002: 569-577. 被引量：1
9Whiteson S, Stone P.Sample efficient evolutionary func- tion approximation for reinforcement learning[C]//Pro- ceedings of the 21st National Conference on Artificial Intelligence, Boston, MA, 2006. 被引量：1
10Argall B D,Chernova S,Veloso M,et al.A survey of robot learning from demonstration[J].Robotics and Autono- mous Systems, 2009,57(5) : 469-483. 被引量：1

二级参考文献13

1A G Barto,S Sutton,C W Anderson.Neuronlike adaptive elements that can solve difficult learning control problems[J].IEEE Trans.on Systems,Man,and Cybernetics,1983,13(5):834-846. 被引量：1
2Charles W Anderson.Learning to Control an Inverted Pendulum Using Neural Networks[J].IEEE Control System Magazine,1989,9(4):31-35. 被引量：1
3H R Berenji and P Khedkar.Learning and tuning fuzzy logic controllers through reinforcements[J].IEEE Transactions on Neural Networks,1992,3(5):724-740. 被引量：1
4Cheng-Jian Lin,Chin-Teng-Lin.Reinforcement Learning for An ART-Based Fuzzy Adaptive Learning Control Network[J].IEEE TRANSCATIONS ON NEURAL NETWORKS.1996,7(3):709-731. 被引量：1
5Jennie Si,Yu-Tsung Wang.On-Line Learning Control by Association and Reinforcement[J].IEEE TRANSACTIONS ON NEURAL NETWORKS,2001,12(2):264-276. 被引量：1
6Danil V Prokorov,Donald C Wunsch.Adaptive Critic Designs[J].IEEE TRANSACTIONS ON NEURAL NETWORKS,1997,8(5):997-1007. 被引量：1
7Simon Haykin.NEURAL NETWORKS A Comprehensive Foundation[M].Beijing:Tsinghua University Press,2001. 被引量：1
8蒋国飞,吴沧浦.基于Q学习算法和BP神经网络的倒立摆控制[J].自动化学报,1998,24(5):662-666. 被引量：55
9蒋国飞,吴沧浦.Q学习算法在库存控制中的应用[J].自动化学报,1999,25(2):236-241. 被引量：19
10高阳,周志华,何佳洲,陈世福.基于Markov对策的多Agent强化学习模型及算法研究[J].计算机研究与发展,2000,37(3):257-263. 被引量：30

共引文献264

1项宇,秦进,袁琳琳.结合向前状态预测和隐空间约束的强化学习表示算法[J].计算机系统应用,2022,31(11):148-156. 被引量：4
2安萌萌,樊秀梅,蔡含宇.基于雾计算和强化学习的交通灯智能协同控制研究[J].计算机应用研究,2020,37(2):465-469. 被引量：8
3丁志梁,潘毅群(指导),谢建彤,王尉同,黄治钟.强化学习算法在空调系统运行优化中的应用研究[J].建筑节能,2020(7):14-20. 被引量：7
4王彦朋,郭佳佳,王晓君.基于Q-Learning的青霉素发酵过程控制方法[J].信息化研究,2023,49(3):31-35.
5马庆刘,喻鹏,吴佳慧,熊翱,颜拥.基于深度强化学习的综合能源业务通道优化机制[J].北京邮电大学学报,2020,43(2):87-93. 被引量：1
6赵元,张合新.基于目标状态距离简化Q-learning算法的迷宫路径规划[J].火箭军工程大学学报,2019(4):79-84.
7曾智刚.基于强化学习的神经网络在船模速度控制中的应用[J].计算机时代,2009(4):24-25. 被引量：1
8周济,陈锋.基于强化神经网络的区域协调控制研究[J].电子技术（上海）,2010(9):20-22.
9卓睿,陈宗海,陈春林.基于强化学习和模糊逻辑的移动机器人导航[J].计算机仿真,2005,22(8):157-162. 被引量：5
10魏英姿 ,赵明扬 .一种基于强化学习的作业车间动态调度方法[J].自动化学报,2005,31(5):765-771. 被引量：19

1陈宇,王明月,许莉薇.基于DE-ELM的林业信息文本分类算法[J].计算机工程与设计,2015,36(9):2412-2415. 被引量：5
2高兴.NEAT小行星搜索攻略（二）[J].天文爱好者,2006(8):22-25.
3黄家贞.图像降噪利器 Neat Image Pro[J].电脑迷,2006,0(2):72-72.
4降噪进行时[J].数码摄影,2007(12):138-139.
5王蕾.浅谈高职院校信息技术课程教学[J].黑龙江科技信息,2009(28):163-163. 被引量：1
6杨聪,王文永,蔡宏亮,张拓.基于手机浏览器开放平台的移动学习研究[J].中国信息技术教育,2011(13):111-113. 被引量：1
7陈宇,许莉薇,黄仲洋,江露.SADE-ELM电容层析成像流型辨识算法[J].哈尔滨理工大学学报,2014,19(6):32-37. 被引量：2
8向东,陈宇,陈广胜.基于LBP-DEELM的木材纹理分类算法[J].福建林业科技,2015,42(4):57-63. 被引量：3
9小李.用Neat Image给数码照片去噪点[J].网络与信息,2008(1):47-47.
10方明,李天太,杨军令.自适应软件应用平台模型的研究[J].西安石油学院学报,1998,13(2):53-56.

计算机工程与应用

2014年第11期

浏览历史

内容加载中请稍等...

一种结合演示数据和演化优化的强化学习方法

参考文献15

二级参考文献13

共引文献264

相关作者

相关机构

相关主题

浏览历史