机器人足球赛中基于增强学习的行为参数优化

BEHAVIOR PARAMETERS' OPTIMIZATION OF ROBOT SOCCER BASED ON REINFORCEMENT LEARNING

导出

摘要采用增强学习方法优化机器人行为的参数,让研究者去决定机器人控制系统的行为结构,让机器人在实际运行过程中通过不断地试错学习在线优化性能指标,既利用了人的高级智能,又避开了研究人员无法深入机器人运行细节的困难,具有明显的实用性。机器人足球赛仿真实验结果显示了方法的有效性。 Reinforcement learning is a popular learning method in the research domain of mobile robot because of its concise concept and simple implementation. Its current application mainly focuses on two area, one is to learn the relationship between discrete states and actions to obtain new behaviors, the other is to coordinate existed behaviors to generate purposive behavior sequences. Useing reinforcement learning to optimize behavior parameters is a practical way to improve the robot's performance. Having researches to determine the behaviors' structure and control logic, having robot to determine the optimal parameters by online trial-and-error learning, this method utilizes human's high intelligence and avoids the shortcoming that researchers can not go deep into the execution details, so that it has practical value. We developed a simulator for robot soccer, in which each robot has three behaviors based on three different motor schemas. In this paper we introduce a reinforcement learning method to optimize the weights of motor schemas within each behavior by online trial-and-error learning. The learning method uses the Gauss kernel to distribute the reward to the whole action space so that it can deal with continuous actions. We have one team's behavior parameters fixed, and let the other team learn the parameters' optimal probability density distribution, because every policy used by robot in robot soccer only can win the score with different probability. The simulation results show that the behaviors' probability density distribution of learning team convergent. The learning team can obtain the optimal parameters by online learning.

作者顾冬雷陈卫东席裕庚

机构地区上海交通大学自动化研究所

出处《模式识别与人工智能》 EI CSCD 北大核心 2001年第2期140-144,共5页 Pattern Recognition and Artificial Intelligence

基金国家863计划资助项目

关键词增强式学习机器人足球赛参数优化控制系统移动机器人 Reinforcement Learning, Robot Soccer, Parameter Optimization

分类号 TP242 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献2

1Maja J. Matari?. Reinforcement Learning in the Multi-Robot Domain[J] 1997,Autonomous Robots(1):73～83 被引量：1
2Christopher J.C.H. Watkins,Peter Dayan. Technical Note: Q-Learning[J] 1992,Machine Learning(3-4):279～292 被引量：1

1沈志忠,曹志强,谭民,王硕.基于增强式学习的仿生机器鱼避障控制[J].高技术通讯,2006,16(12):1253-1258. 被引量：3
2庄晓东,孟庆春,熊建设,殷波,王汉萍.动态环境中基于增强式学习的路径规划方法[J].机器人,2001,23(S1):712-716. 被引量：1
3高文,陈树楷.行为型结构方法及其对人工智能研究的促进[J].计算机研究与发展,1992,29(3):21-24. 被引量：2
4庄晓东,孟庆春,王汉萍,殷波.多障碍环境中基于增强式学习的势场优化和机器人路径规划[J].青岛海洋大学学报（自然科学版）,2001,31(6):937-942. 被引量：7
5王直杰,方建安,邵世煌.一种采用增强式学习的模糊控制系统研究[J].控制与决策,1997,12(2):188-191. 被引量：3
6T.P.ImthiasAhamed P.S.Nagendra Rao,P.S.Sastry 陈兆兵.增强式学习（LR）方法在AGC的运用[J].华东电力,2003,31(7):87-87.
7张淑军,孟庆春,宋长虹,李占斌,张文.基于多Agent的混合智能学习算法及在足球机器人中的应用[J].机器人,2003,25(6):526-530. 被引量：3
8杨玉君,程君实,陈佳品,李小海.群体自主微小型移动机器人的合作[J].计算机工程,2003,29(3):24-26.
9吴立军,苏开乐.一种多项式时间复杂度的密码协议秘密性验证方法[J].计算机科学,2005,32(7):109-112.
10樊红珍.SQL注入自动化检测关键技术研究[J].计算机与网络,2017,43(6):46-47.

模式识别与人工智能

2001年第2期

浏览历史

内容加载中请稍等...

机器人足球赛中基于增强学习的行为参数优化

参考文献2

相关作者

相关机构

相关主题

浏览历史