摘要
在多智能体强化学习的研究中,参数共享作为学习过程中一种信息集中的方式,可以有效地缓解不稳定性导致的学习低效性。但是,在实际应用中智能体使用同样的策略往往会带来不利影响。为了解决此类过度共享的问题,提出了一种新的方法来赋予智能体自动识别可能受益于共享参数的智能体的能力,并且可以在学习过程中动态地选择共享参数的对象。具体来说,智能体需要将历史轨迹编码为可表示其潜在意图的隐信息,并通过与其余智能体隐信息的对比选择共享参数的对象。实验表明,提出的方法在多智能体系统中不仅可以提高参数共享的效率,同时保证了策略学习的质量。
In multi-agent reinforcement learning,parameter sharing can effectively alleviate the inefficiency of learning caused by non-stationarity.However,maintaining the same policy forall agents during learning may have detrimental ef-fects.To solve this problem,a new approach was introduced to give agents the ability to automatically identify agents that may benefit from parameter sharing and dynamically share parameters them during learning.Specifically,agents needed to encode empirical trajectories as implicit information that can represent their potential intentions,and selected peers to share parameters by comparing their intentions.Experiments show that the proposed method not only can improve the ef-ficiency of parameter sharing,but also ensure the quality of policy learning in multi-agent system.
作者
王涵
俞扬
姜远
WANG Han;YU Yang;JIANG Yuan(State Key Laboratory for Novel Software Technology at Nanjing University,Nanjing 210023,China)
出处
《智能科学与技术学报》
2022年第1期75-83,共9页
Chinese Journal of Intelligent Science and Technology
基金
国家自然科学基金资助项目(No.61876077)。
关键词
多智能体系统
强化学习
参数共享
multi-agent system
reinforcement learning
parameter sharing