期刊文献+

动态环境下的多智能体机器人协作模型 被引量:6

The model of multi-agent cooperation in the dynamic environment
原文传递
导出
摘要 提出了在动态环境中,多Agent的一种协作模型,适用于环境信息不完备的复杂情况.将Agent的独立强化学习与BDI模型结合起来,使多Agent系统不但拥有强化学习的高度反应性和自适应性,而且拥有BDI的推理能力,使只使用数值分析而忽略推理环节的强化学习结合了逻辑推理方法.使用了Borlzman选取随机动作,并且采用了新的奖励函数和表示方法,减少了学习空间,提高了学习速度.仿真结果表明所提方法可行,能够满足多Agent系统的要求. In this paper,a new multi-agent cooperating model of dynamic enviroment is proposed,which is suitable for complex situation of incomplete enviroment information.By combination of independence reinforcement study and the belief-desire-intention(BDI) model,multi-agent system has not only highly reactivity but also reasoning faculty.The method reduces study space by a new reinforcement function and increases study speed by Borlzman function.The results of simulation experiment indicate the effectiveness of the...
出处 《华中科技大学学报(自然科学版)》 EI CAS CSCD 北大核心 2008年第S1期39-41,52,共4页 Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金 国家高科技发展计划重点基金资助项目(2007AA041603)
关键词 机器人 多智能体系统 强化学习 协作 动态环境 robot multi-agent system reinforcement cooperation dynamic enviroment
  • 相关文献

参考文献3

二级参考文献15

  • 1[1]M L Littman. Markov games as framework for multi-agent reinforcement learning[A]. Proc of the 11th Int Conf on Machine Learning[C]. San Francisco: Morgan Kaufmann,1994.157-163. 被引量:1
  • 2[2]J Hu, M P Wellman. Multiagent reinforcement learning: Theoretical framework and an algorithm[A]. Proc of the 15th Int Conf on Machine Learning[C]. Morgan Kaufmann,1998.242-250. 被引量:1
  • 3[3]C Claus, C Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems[A]. Proc of the 15th National Conf on Artificial Intelligence[C]. Cambridge MIT Press,1997.235-262. 被引量:1
  • 4[4]D H Wolpert, K Wheeler, K Tumer, et al. General principles of learning-based multi-agent systems[A]. Proc of the Third Int Conf of Autonomous Agents[C]. Seattle,1999.77-83. 被引量:1
  • 5[5]J A Boyan, M L Littman. Packet routing in dynamically changing networks: A reinforcement learningapproach[J]. Adv in Neur Inform Proc Syst,1993,6:671-678. 被引量:1
  • 6[6]R H Crites, A G Barto. Elevator group control using multiple reinforcement learning agents[J]. Machine Learning,1998,33:235-262. 被引量:1
  • 7[7]J Schneider, W K Wong, A Moore, et al. Distributed value functions[A]. Proc of the 16th Int Conf on Machine Learning[C]. San Francisco: Morgan Kaufmann,1999.371-378. 被引量:1
  • 8[8]C Watkins. Q-learning[J]. Machine Learnning,1992,8:279-292. 被引量:1
  • 9[9]C Watkins. Learning from delayed rewards[D]. Cambridge: Cambridge University,1989. 被引量:1
  • 10[10]A G Barto, R S Sutton, C Watkins. C Learning and sequential decision making[A]. Learning and Computational Newroscience: Foundation of Addaptive Networks[C]. Cambridge MIT Press,1990.539-602. 被引量:1

共引文献11

同被引文献46

引证文献6

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部