期刊文献+

基于路径引导知识启发的强化学习方法 被引量:4

A Method of Heuristic Reinforcement Learning Based on Acquired Path Guiding Knowledge
下载PDF
导出
摘要 为了提高强化学习算法的运行效率和收敛速度,提出了一种基于路径引导知识启发的强化学习方法PHQL。采用PHQL方法,不需要提前植入先导知识,agent在每一轮学习过程中更新Q表的同时,各个状态的路径知识也自主地建立起来并逐步修正和优化。算法利用已经获得的路径知识来指导和加速agent以后的强化学习过程,以减少agent学习过程的盲目性。分析了PHQL算法的探索、利用和启发3种行为的执行概率以及行为选取方法,提出一种行为选择概率随时间渐变的算法。以一个路径搜索问题为实例,对PHQL方法进行了验证、分析并与几种相关的强化学习算法进行了性能对比。实验结果表明,作者提出的方法对学习过程具有明显的加速作用,收敛性能有了较大的提高。 In order to improve the efficiency and convergence speed of reinforcement learning algorithm,a method of heuristic reinforcement learning based on acquired path guiding knowledge(PHQL) was proposed.During the learning process using PHQL,embeded background knowledge was not needed to agent.While the agent updated the Q table in each episode,the path knowledge was also built,revised and optimized autonomously.After that,the lerning process was guided and accelerated by means of acquired path knowledge,which decreased the blindness of agent.In addition,three sorts of action selection methods of exploration,exploit and heuristic were analyzed,and also a practical method that action selection probilities changed over time was put forward.In a path planning environment,the PHQL was compared to the standard Q-learning and other relevant reinforcement learning algorithms.The experimental results showed that present methods accelerate the learning process obviously,and improve the convergence speed distinctly.
出处 《四川大学学报(工程科学版)》 EI CAS CSCD 北大核心 2012年第5期136-142,共7页 Journal of Sichuan University (Engineering Science Edition)
基金 国家自然科学基金资助项目(60971088) 国家自然科学基金资助项目(60571048)
关键词 PHQL Q学习 强化学习 路径规划 知识启发 PHQL Q-learning reinforcement learning path planning knowledge heuristic approach
  • 相关文献

参考文献17

  • 1Sutton R S, Barto A G. Introduction to reinforcement learning [ M ]. Cambridge MA : MIT Press, 1998. 被引量:1
  • 2Sutton R S, Precup D, Singh S P. Between MDPs and SemiMDPs : A framework for temporal abstraction in reinforcement learning [ J ]. Artificial Intelligence, 1999,112 ( 1/2 ) : 181 - 211. 被引量:1
  • 3Parr R. Hierachical control and learning for markov decision processes [ D ]. Berkeley: University of California, USA, 1998. 被引量:1
  • 4Hengst B. Discovering hierarchical reinforcement learning [ D ]. Sydney : University of New South Wales, Australia, 2003. 被引量:1
  • 5Dietterich T G. Hierarchical reinforcement learning with the MAXQ value function decomposition [ J ]. Journal of Artifi- cial Intelligence Research ,2000,13:227 - 303. 被引量:1
  • 6Smith A J. Applications of the serf-organizing map to rein- forcement learning [ J ]. Neural Network, 2002,15 : 1107 - 1124. 被引量:1
  • 7Martin J A, Lope J, Maravall D. Robust high performance reinforcement learning through weighted k-nearest neighbors [ J ]. Neurocomputing,2011,74 (8) : 1251 - 1259. 被引量:1
  • 8Andrew Y N, Harada D, Russell S. Policy invariance under reward transformations:theory and application to reward sha- ping[ C ]//Proceedings of the Sixteenth International Confer- ence on Machine Learning. Bled, Slovenia, 1999:278 -287. 被引量:1
  • 9Asmuth J, Littman M, Zinkov R. Potential-based shaping in- model-based reinforcement learning [ C ]//Proceedings of AAAI Conference on Artificial Intelligence. Chicago, USA, 2008. 被引量:1
  • 10Grzes M, Kudenko D. Online learning of shaping rewards in reinforcement learning[ J ]. Neural Networks, 2010,23 (4) : 541 - 550. 被引量:1

二级参考文献23

  • 1Bernstein D S. Reusing Old Policies to Accelerate Learning on New MDPs. Technical Report, UM-CS-1999-026, Department of Computer Science, University of Massachusetts, Amherst,USA, 1999. 被引量:1
  • 2Sutton R, Precup D, Singh S. Between MDPs 0nd Semi-MDPs:A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence, 1999, 112(1-2): 181-211. 被引量:1
  • 3Stolle M, Precup D. Learning Options in Reinforcement Learning. In: Proc of the 5th International Symposium on Abstraction, Reformulation and Approximation. Kananaskis, Canada,2002, 212-223. 被引量:1
  • 4Kaelbling I. P, Littman M L, Moore A W. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research,1996, 4:237-285. 被引量:1
  • 5Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, USA: MIT Press. 1998. 被引量:1
  • 6Iba G A. A Heuristic Approach to the Discovery of Macro-Operators. Machine Learning, 1989, 3(4): 285-317. 被引量:1
  • 7Precup D. Temporal Abstraction in Reinforcement Learning.Ph. D Dissertation. University of Massachusetts, Amherst,USA, 2000. 被引量:1
  • 8Digney B. Learning Hierarchical Control Structure for Multiple Tasks and Changing Environments. In: Proc of the 5th Conference on Simulation of Adaptive Behavior. Cambridge, USA:MIT Press, 1998. http://www.ri.cmu.edu/pubs/pub_3150.html. 被引量:1
  • 9Digney B. Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments. In: Meas P, Mataric M, eds. Proc of the 4th Conferenceon Simulation of Adaptive Behavior. Cambridge, USA: MIT Press, 1996. http://www.ri.cmu.edu/pubs/pub_3151.html. 被引量:1
  • 10Thrun S, Schwartz A. Finding Structure in Reinforcement Learning. In: Tesauro G, Touretzky D, Leen T, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 1995, 385-392. 被引量:1

共引文献15

同被引文献46

  • 1Filliat G, Mayer J. Map based navigation in mobile robots: I. A review of localization strategies. Cognitive System Research, 2003, 4: 243-82. 被引量:1
  • 2Mayer J, Filliat G. Map based navigation in mobile robots: Il. A review of map learning and path planning strategies. Cognitive System Research, 2003, 4:283-317. 被引量:1
  • 3Malki A, Lee J, Lee S. Vision based path planning for mobile robot using extrapolated artificial potential field and probabilistic obstacle avoidance. ASME International Mechanical Engineering Congress and Exposition, 2002: 133-9. 被引量:1
  • 4Guo M, Liu Y, Malec J. A new Q-learning algorithm based on the metropolis criterion. IEEE Trans. Syst. ManCybern. B, 2004, 34(5): 2140-2143. 被引量:1
  • 5Framling K. Guiding exploration by pre-existing knowledge without modifying reward. Neur. Networks, 2007, 20(6): 736-747. 被引量:1
  • 6Jaradat K, AI-Rousan MAM, Quadan L. Reinforcement based mobile robot navigation in dynamic environment. Robotics and Computer-Integrated Manufacturing, 2011, 27(1): 135-149. 被引量:1
  • 7Hugues L, Bredeche N. Simbad: an autonomous robot simulation package for education and research. From Animals to Animats 9. Springer Berlin Heidelberg. 2006.831-842. 被引量:1
  • 8蒋艳凰,赵强利.机器学习方法北京:电子工业出版社,2009. 被引量:3
  • 9宗群,孙正雅,宋超峰.基于平均报酬强化学习的电梯群组调度研究[J].系统仿真学报,2007,19(21):4945-4948. 被引量:1
  • 10SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge Massachusetts: MIT Press, 1998. 被引量:1

引证文献4

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部