APPLICATION OF HIERARCHICAL REINFORCEMENT LEARNING IN ENGINEERING DOMAIN
被引量:3
APPLICATION OF HIERARCHICAL REINFORCEMENT LEARNING IN ENGINEERING DOMAIN
基金
ThisworkwassupportedpartlybytheNationalNaturalScienceFoundationofChinaunderGrantNo.69975013
参考文献11
-
1[1]Bao, G., C. G. Cassandras, T. E. Djaferis,A.D. Gandhi, and D. P. Looze, "Elevator dispatchers for down peak traffic", ECE Department Technical Report, University of Massachusetts, 1994. 被引量:1
-
2[2]Barto, A. G., S. Mahadevan, "Recent advances in hierarchical reinforcement learning", Discrete Event Dynamic Systems:Theory and Applications, Vol. 13, pp41-77,2003. 被引量:1
-
3[3]Bradtke, S. J. and M. O. Duff,"Reinforcement learning methods for continuous-time Markov decision problems", Advances in Neural Information Processing Systems 7,Cambridge, MA, 1995. 被引量:1
-
4[4]Crites, R. H. and A. G. Barto, "Improving elevator performance using reinforcement learning", Advances in Neural Information Processing Systems 8, pp1017-1023, 1996. 被引量:1
-
5[5]Mahadevan, S., M. Nicholas, D. Tapas. and G. Abhijit, "Self-Improving factory simulation using continuous-time average-reward reinforcement learning",Proceedings of the 14th International Conference on Machine Learning (IMLC ′97), Nashville, TN, 1997. 被引量:1
-
6[6]Mataric, M., "Reinforcement learning in the multi-robot domain", Autonomous Robots, Vol. 4, No. 1, pp73-83, 1997. 被引量:1
-
7[7]Parr, R., "Hierarchical control and learning for markov decision processes", Ph.D.dissertation, University of California,Berkeley, CA, 1998. 被引量:1
-
8[8]Rajbala, M., M. Sridhar, and G.Mohammad, "Hierarchical multi-agent reinforcement learning", Proceedings of the fifth International Conference on Autonomous Agents, pp246-253, 2001. 被引量:1
-
9[9]Sutton, R.S. and A.G. Barto, Reinforcement Learning: An Introduction, Cambridge,MA: MIT Press, 1998. 被引量:1
-
10[10]Szepesvari, C. and M. L. Littman, "A unified analysis of value-function-based reinforcement learning algorithms", Neuro Computing, Vol. 11, pp2017-2060, 1999. 被引量:1
同被引文献46
-
1苏畅,高阳,陈世福,陈兆乾.基于SMDP环境的自主生成options算法的研究[J].模式识别与人工智能,2005,18(6):679-684. 被引量:9
-
2彭志平,彭宏,郑启伦.一种双边多议题自治协商模型的研究[J].电子与信息学报,2007,29(3):733-738. 被引量:12
-
3高阳,周如益,王皓,曹志新.平均奖赏强化学习算法研究[J].计算机学报,2007,30(8):1372-1378. 被引量:38
-
4FISCHER F, ROVATSOS M, WEISS G. Hierarchical reinforcement learning in communication-mediated multiagent coordination [ C ]// Proc of the 3rd International Conference on Autonomous Agents and Muhiagent Systems. New York: ACM Press, 2004. 被引量:1
-
5HENGST B. Discovering hierarchy in reinforcement learning [ D ]. Sydney: University of New South Wales, 2003. 被引量:1
-
6SKELLY M M. Hierarchical reinforcement learning with function approximation for adaptive control [ D ]. Ohio : Case Western Reserve University, 2004. 被引量:1
-
7UTHER W T B. Tree based hierarchical reinforcement learning[ D]. Pittsburgh: Carnegie Mellon University, 2002. 被引量:1
-
8BELLMAN R E, DREYFUS S E. Applied dynamic programming [ M ]. New Jersey : Princeton University Press, 1962. 被引量:1
-
9WATKINS C, DAYAN P. Q-learning[J]. Machine Learning, 1992,8(3 ) :279-292. 被引量:1
-
10PARR R. Hierarchical control and learning for Markov decision processes [ D ]. Berkeley, Califomia: University of California, 1998. 被引量:1
引证文献3
-
1彭志平,李绍平.一种基于PSO的分层策略搜索算法[J].模式识别与人工智能,2008,21(1):98-103. 被引量:1
-
2彭志平,李绍平.分层强化学习研究进展[J].计算机应用研究,2008,25(4):974-978. 被引量:7
-
3唐昊,张晓艳,韩江洪,周雷.基于连续时间半马尔可夫决策过程的Option算法[J].计算机学报,2014,37(9):2027-2037. 被引量:2
二级引证文献10
-
1宋炯,金钊,杨维和.机器学习中加速强化学习的一种函数方法[J].云南大学学报(自然科学版),2011,33(S2):176-181.
-
2廉佐政,王海珍,邓文新,滕艳平.应用记忆演化学习的Agent协商研究[J].计算机工程与应用,2009,45(19):131-133. 被引量:1
-
3戴朝晖,袁姣红,吴敏,陈鑫.基于概率模型的动态分层强化学习[J].控制理论与应用,2011,28(11):1595-1600. 被引量:2
-
4李誌,胡坤,余雪丽.基于半马氏博弈模型的分层强化学习研究[J].计算机工程与设计,2012,33(9):3558-3562. 被引量:2
-
5唐昊,张晓艳,韩江洪,周雷.基于连续时间半马尔可夫决策过程的Option算法[J].计算机学报,2014,37(9):2027-2037. 被引量:2
-
6彭志平,周晓柯,孙志毅.一种融合Options与蚁群算法的虚拟机自适应配置方法[J].小型微型计算机系统,2015,36(4):801-805.
-
7王蕾.一种基于示例轨迹的抽象动作树构造方法[J].计算机与现代化,2016(6):85-90. 被引量:1
-
8朱斐,许志鹏,刘全,伏玉琛,王辉.基于可中断Option的在线分层强化学习方法[J].通信学报,2016,37(6):65-74. 被引量:4
-
9郭乐欣,张孝顺,谭敏,余涛.基于群智能强化学习的电网最优碳-能复合流算法[J].电测与仪表,2017,54(1):1-7. 被引量:4
-
10曹洁,邵紫旋,侯亮.基于分层强化学习的自动驾驶车辆掉头问题研究[J].计算机应用研究,2022,39(10):3008-3012. 被引量:1
-
1LiLi.王承龙:新技术写就的木质生活[J].缤纷,2011(9):63-63.
-
2王承龙,刘凌晨,冯元玥.设计“慢”谈[J].城市环境设计,2015,0(6):273-275.
-
3张洁,刘凌晨,王承龙.慢建筑--因木建筑而“慢”[J].建筑技艺,2016,22(8):76-81.
-
4Hao Hua.Special report on the international conference and exhibition on Architectural Algorithms εt Applications (the AAA conference) in Nanjing[J].Frontiers of Architectural Research,2017,6(1):108-110.
-
5费致为.W house/SLOW工作室[J].城市环境设计,2015,0(6):280-285.
-
6侯立萍,王承龙.“慢”建筑[J].建筑知识,2012,32(7):68-72.
-
7张游.北京城边的木头房子[J].建筑知识,2013,33(1):100-103.
-
8RE-NEW System与“透博新”凝胶防水材料[J].中国建筑防水,2006(1):20-21.
-
9付思量,朱文一.北京奥林匹克公园中心区城市活力研究[J].北京规划建设,2010(2):89-93. 被引量:2
-
10Ping-JingQiu.Chinese Thoughts on domain on the value of standard during the period of social transition[J].International Journal of Technology Management,2014(5):19-21.