基于路径引导知识启发的强化学习方法被引量：4

A Method of Heuristic Reinforcement Learning Based on Acquired Path Guiding Knowledge

下载PDF

导出

摘要为了提高强化学习算法的运行效率和收敛速度,提出了一种基于路径引导知识启发的强化学习方法PHQL。采用PHQL方法,不需要提前植入先导知识,agent在每一轮学习过程中更新Q表的同时,各个状态的路径知识也自主地建立起来并逐步修正和优化。算法利用已经获得的路径知识来指导和加速agent以后的强化学习过程,以减少agent学习过程的盲目性。分析了PHQL算法的探索、利用和启发3种行为的执行概率以及行为选取方法,提出一种行为选择概率随时间渐变的算法。以一个路径搜索问题为实例,对PHQL方法进行了验证、分析并与几种相关的强化学习算法进行了性能对比。实验结果表明,作者提出的方法对学习过程具有明显的加速作用,收敛性能有了较大的提高。 In order to improve the efficiency and convergence speed of reinforcement learning algorithm,a method of heuristic reinforcement learning based on acquired path guiding knowledge（PHQL） was proposed.During the learning process using PHQL,embeded background knowledge was not needed to agent.While the agent updated the Q table in each episode,the path knowledge was also built,revised and optimized autonomously.After that,the lerning process was guided and accelerated by means of acquired path knowledge,which decreased the blindness of agent.In addition,three sorts of action selection methods of exploration,exploit and heuristic were analyzed,and also a practical method that action selection probilities changed over time was put forward.In a path planning environment,the PHQL was compared to the standard Q-learning and other relevant reinforcement learning algorithms.The experimental results showed that present methods accelerate the learning process obviously,and improve the convergence speed distinctly.

作者刘智斌曾晓勤

机构地区河海大学计算机与信息学院智能科学与技术研究所

出处《四川大学学报（工程科学版）》 EI CAS CSCD 北大核心 2012年第5期136-142,共7页 Journal of Sichuan University (Engineering Science Edition)

基金国家自然科学基金资助项目(60971088) 国家自然科学基金资助项目(60571048)

关键词 PHQL Q学习强化学习路径规划知识启发 PHQL Q-learning reinforcement learning path planning knowledge heuristic approach

分类号 TP301 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献17

1Sutton R S, Barto A G. Introduction to reinforcement learning [ M ]. Cambridge MA : MIT Press, 1998. 被引量：1
2Sutton R S, Precup D, Singh S P. Between MDPs and SemiMDPs : A framework for temporal abstraction in reinforcement learning [ J ]. Artificial Intelligence, 1999,112 ( 1/2 ) : 181 - 211. 被引量：1
3Parr R. Hierachical control and learning for markov decision processes [ D ]. Berkeley: University of California, USA, 1998. 被引量：1
4Hengst B. Discovering hierarchical reinforcement learning [ D ]. Sydney : University of New South Wales, Australia, 2003. 被引量：1
5Dietterich T G. Hierarchical reinforcement learning with the MAXQ value function decomposition [ J ]. Journal of Artifi- cial Intelligence Research ,2000,13:227 - 303. 被引量：1
6Smith A J. Applications of the serf-organizing map to rein- forcement learning [ J ]. Neural Network, 2002,15 : 1107 - 1124. 被引量：1
7Martin J A, Lope J, Maravall D. Robust high performance reinforcement learning through weighted k-nearest neighbors [ J ]. Neurocomputing,2011,74 (8) : 1251 - 1259. 被引量：1
8Andrew Y N, Harada D, Russell S. Policy invariance under reward transformations:theory and application to reward sha- ping[ C ]//Proceedings of the Sixteenth International Confer- ence on Machine Learning. Bled, Slovenia, 1999:278 -287. 被引量：1
9Asmuth J, Littman M, Zinkov R. Potential-based shaping in- model-based reinforcement learning [ C ]//Proceedings of AAAI Conference on Artificial Intelligence. Chicago, USA, 2008. 被引量：1
10Grzes M, Kudenko D. Online learning of shaping rewards in reinforcement learning[ J ]. Neural Networks, 2010,23 (4) : 541 - 550. 被引量：1

二级参考文献23

1Bernstein D S. Reusing Old Policies to Accelerate Learning on New MDPs. Technical Report, UM-CS-1999-026, Department of Computer Science, University of Massachusetts, Amherst,USA, 1999. 被引量：1
2Sutton R, Precup D, Singh S. Between MDPs 0nd Semi-MDPs:A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence, 1999, 112(1-2): 181-211. 被引量：1
3Stolle M, Precup D. Learning Options in Reinforcement Learning. In: Proc of the 5th International Symposium on Abstraction, Reformulation and Approximation. Kananaskis, Canada,2002, 212-223. 被引量：1
4Kaelbling I. P, Littman M L, Moore A W. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research,1996, 4:237-285. 被引量：1
5Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, USA: MIT Press. 1998. 被引量：1
6Iba G A. A Heuristic Approach to the Discovery of Macro-Operators. Machine Learning, 1989, 3(4): 285-317. 被引量：1
7Precup D. Temporal Abstraction in Reinforcement Learning.Ph. D Dissertation. University of Massachusetts, Amherst,USA, 2000. 被引量：1
8Digney B. Learning Hierarchical Control Structure for Multiple Tasks and Changing Environments. In: Proc of the 5th Conference on Simulation of Adaptive Behavior. Cambridge, USA:MIT Press, 1998. http://www.ri.cmu.edu/pubs/pub_3150.html. 被引量：1
9Digney B. Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments. In: Meas P, Mataric M, eds. Proc of the 4th Conferenceon Simulation of Adaptive Behavior. Cambridge, USA: MIT Press, 1996. http://www.ri.cmu.edu/pubs/pub_3151.html. 被引量：1
10Thrun S, Schwartz A. Finding Structure in Reinforcement Learning. In: Tesauro G, Touretzky D, Leen T, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 1995, 385-392. 被引量：1

共引文献15

1彭志平,李绍平.一种基于PSO的分层策略搜索算法[J].模式识别与人工智能,2008,21(1):98-103. 被引量：1
2杜小勤,李庆华,韩建军.一种基于HAMs体系的层次分解方法[J].小型微型计算机系统,2008,29(4):653-658.
3石川,史忠植,王茂光.基于路径匹配的在线分层强化学习方法[J].计算机研究与发展,2008,45(9):1470-1476. 被引量：4
4陈学松,杨宜民.基于递推最小二乘法的多步时序差分学习算法[J].计算机工程与应用,2010,46(8):52-55. 被引量：5
5陈学松,杨宜民.强化学习研究综述[J].计算机应用研究,2010,27(8):2834-2838. 被引量：62
6刘全,闫其粹,伏玉琛,胡道京,龚声蓉.一种基于启发式奖赏函数的分层强化学习方法[J].计算机研究与发展,2011,48(12):2352-2358. 被引量：11
7汪慧玲,范宪伟,杨华磊.多体系统差异度测量与系统引力[J].沈阳师范大学学报（自然科学版）,2011,29(4):495-498.
8刘智斌,朱晓龙,曹宝香.一种自适应程序设计方法[J].计算机工程与应用,2011,47(36):80-82. 被引量：1
9李誌,胡坤,余雪丽.基于半马氏博弈模型的分层强化学习研究[J].计算机工程与设计,2012,33(9):3558-3562. 被引量：2
10汪浩祥,严洪森.基于SAUBQ学习的知识化制造系统自适应调度策略[J].系统工程理论与实践,2014,34(7):1885-1894. 被引量：3

同被引文献46

1Filliat G, Mayer J. Map based navigation in mobile robots: I. A review of localization strategies. Cognitive System Research, 2003, 4: 243-82. 被引量：1
2Mayer J, Filliat G. Map based navigation in mobile robots: Il. A review of map learning and path planning strategies. Cognitive System Research, 2003, 4:283-317. 被引量：1
3Malki A, Lee J, Lee S. Vision based path planning for mobile robot using extrapolated artificial potential field and probabilistic obstacle avoidance. ASME International Mechanical Engineering Congress and Exposition, 2002: 133-9. 被引量：1
4Guo M, Liu Y, Malec J. A new Q-learning algorithm based on the metropolis criterion. IEEE Trans. Syst. ManCybern. B, 2004, 34(5): 2140-2143. 被引量：1
5Framling K. Guiding exploration by pre-existing knowledge without modifying reward. Neur. Networks, 2007, 20(6): 736-747. 被引量：1
6Jaradat K, AI-Rousan MAM, Quadan L. Reinforcement based mobile robot navigation in dynamic environment. Robotics and Computer-Integrated Manufacturing, 2011, 27(1): 135-149. 被引量：1
7Hugues L, Bredeche N. Simbad: an autonomous robot simulation package for education and research. From Animals to Animats 9. Springer Berlin Heidelberg. 2006.831-842. 被引量：1
8蒋艳凰,赵强利.机器学习方法北京:电子工业出版社,2009. 被引量：3
9宗群,孙正雅,宋超峰.基于平均报酬强化学习的电梯群组调度研究[J].系统仿真学报,2007,19(21):4945-4948. 被引量：1
10SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge Massachusetts: MIT Press, 1998. 被引量：1

引证文献4

1童小龙,姚明海,张灿淋.基于未知环境状态新定义及知识启发的机器人导航Q学习算法[J].计算机系统应用,2014,23(1):149-153. 被引量：1
2刘智斌,曾晓勤,徐彦,禹继国.采用资格迹的神经网络学习控制算法[J].控制理论与应用,2015,32(7):887-894. 被引量：4
3王晓华,吴枝娥,张蕾.一种移动机器人在线全局路径规划方法[J].西安工程大学学报,2017,31(5):620-626. 被引量：9
4李琛,李茂军,杜佳佳.一种强化学习行动策略ε-greedy的改进方法[J].计算技术与自动化,2019,38(2):141-145. 被引量：2

二级引证文献16

1吕红芳,顾幸生.基于蚁群神经网络的两级信息融合算法[J].上海交通大学学报,2016,50(8):1323-1330. 被引量：17
2牛亚东,储健,李刚.一种新型淀粉含水量测量方法及仿真[J].天津职业技术师范大学学报,2016,26(3):30-33.
3魏倩,蔡远利.J_2项摄动影响下的大气层外弹道规划改进算法[J].控制理论与应用,2016,33(9):1245-1251. 被引量：4
4孙静,陈伟,张伟,刘霞.基于激光扫描的机器人步态轨迹检测系统[J].激光杂志,2019,40(11):58-61.
5李鹏飞,姚炜铭,广夏桐,王晓华.精梳棉车间棉卷自动运输系统开发[J].测控技术,2019,38(12):29-34. 被引量：1
6马丽萍,吴丹丹,姚鑫,李珣.基于微分进化算法的室内移动机器人路径规划[J].西安工程大学学报,2020,34(1):78-84. 被引量：7
7张旭祥,沈丹峰,刘夏轩德,张国英.含关节间隙的并联机器人运动误差分析[J].西安工程大学学报,2020,34(1):91-97. 被引量：10
8胡春阳,姜平,周根荣.改进蚁群算法在AGV路径规划中的应用[J].计算机工程与应用,2020,56(8):270-278. 被引量：39
9孟超,郭倩.基于System Vue的爬壁机器人控制器研究[J].计算机测量与控制,2020,28(5):93-97. 被引量：3
10林孟豪,张蕾,李鹏飞,王晓华.基于复杂轨迹的分段式机械臂轨迹规划[J].西安工程大学学报,2020,34(4):43-50. 被引量：6

1王世喜,王丽娟.车辆导航系统中监控系统设计的研究[J].经济技术协作信息,2009(15):176-176.
2淮扬客.工具快报[J].大众软件,2007(16):54-55.
3张斌,宋英杰.基于iBeacon的室内路径引导系统研究[J].软件工程,2016,19(7):43-45. 被引量：3
4疑难解答[J].微电脑世界,1999,0(2):49-50.
5陈玉明,张广明,赵英凯.基于强化学习的混合智能控制算法研究与分析[J].机床与液压,2010,38(20):75-77.
6孟莹,曹以龙,曾俊冬.电动汽车充电站智能管理系统的设计与实现[J].仪表技术,2016(5):22-24. 被引量：4
7孟莹,曹以龙,曾俊冬.基于MVC的电动汽车充电站信息管理系统研究[J].现代电子技术,2016,39(2):143-146. 被引量：8
8郭成贺.QQ表情真不少导入飞信才叫好[J].电脑爱好者,2011(12):25-25.
9贺毅辉,潘明聪,徐伟,彭辉.基于概率推理的不确定性任务分配评价方法[J].计算机工程,2015,41(2):31-35. 被引量：2
10宋正东,刘智勇.基于C-FQL算法的城市干线交通信号控制[J].五邑大学学报（自然科学版）,2011,25(3):45-50.

四川大学学报（工程科学版）

2012年第5期

浏览历史

内容加载中请稍等...

基于路径引导知识启发的强化学习方法被引量：4

参考文献17

二级参考文献23

共引文献15

同被引文献46

引证文献4

二级引证文献16

相关作者

相关机构

相关主题

浏览历史

基于路径引导知识启发的强化学习方法 被引量：4

参考文献17

二级参考文献23

共引文献15

同被引文献46

引证文献4

二级引证文献16

相关作者

相关机构

相关主题

浏览历史

基于路径引导知识启发的强化学习方法被引量：4