摘要
文中在多智能体对抗问题研究过程中,采用强化学习为研究方法,以完全中心化训练架构为基础,选用基于策略的强化学习算法,针对领域研究中广泛存在的稀疏奖励问题,采用基于任务局部的奖励工程设定方法,以人为经验知识为导引,加速训练过程,提升训练结果。最后以对抗问题中典型的攻防对抗为场景进行了仿真实验,验证了方法的有效性。
In the research of multi-agent confrontation,reinforcement learning is used as the research method.Based on a completely centralized training framework,the policy-based reinforcement learning algorithm is selected.Regarding the problem that sparse reward is widely existed in the field research,a part-task-based reward project setting method is adopted to accelerate the training process and improve the training results with the guidance of human experience knowledge.Finally,a simulation experiment of attack-defense confrontation which is representitve in the field is carried out to verify the effectiveness of the method.
作者
王瑞星
董诗音
江飞龙
黄胜全
WANG Rui-xing;DONG Shi-yin;JIANG Fei-long;HUANG Sheng-quan(Deep Space Exploration Research Center,Harbin Institute of Technology,Harbin 150001,China;Shanghai Electro-Mechanical Engineering Institute,Shanghai 201109,China)
出处
《信息技术》
2021年第5期12-20,共9页
Information Technology
基金
中央军委装备发展部装备预研基金(JZX7Y20-190243001201)。