期刊文献+

基于示范主动采样的行为克隆方法 被引量:1

Behavioral Cloning with Active Sampling of Demonstration
下载PDF
导出
摘要 深度强化学习在学习过程中需要与环境进行大量的交互,训练效率低下。模仿学习通过从专家示范中学习,可以有效地应对这一挑战,但是需要收集大量的专家示范轨迹,在复杂任务中往往导致高昂的示范代价。本文提出一种基于主动学习的行为克隆算法,通过主动挑选示范起始状态来减小示范代价。该方法基于不确定性采样和不相似性采样两种策略,从状态候选集中挑选最有价值的状态作为起始状态,然后向专家查询固定长度的示范轨迹,希望从尽可能少的示范中学习出有效策略。在多个不同任务上的实验表明,本文方法可以用更少的示范轨迹进行行为克隆,降低了强化学习中的专家示范代价。 Deep reinforcement learning has achieved great success in many applications.However,it usually needs large amount of interactions with the environment to learn the policy,which leads to inefficient training.Imitation learning is an important approach to tackle this challenge by learning from demonstrations,but it instead requires a large set of demonstrations provided by experts,which could be rather costly in many complex tasks.In this paper,we propose an active learning method to reduce the demonstration cost by actively selecting starting state for demonstration.The method is based on uncertainty sampling and dissimilarity sampling.It selects the best state from the candidate set and then queries expert for fixed length of trajectory,in order to train effective policy with fewer demonstrations.Experimental results in multiple environments demonstrate that the proposed method can achieve effective performance with significant lower demonstration cost.
作者 黄文宇 黄圣君 HUANG Wenyu;HUANG Shengjun(College of Computer Science and Technology/College of Artificial Intelligence,Nanjing University of Aeronautics&Astronautics,Nanjing 211106,China)
出处 《南京航空航天大学学报》 CAS CSCD 北大核心 2021年第5期766-771,共6页 Journal of Nanjing University of Aeronautics & Astronautics
基金 航空动力基金(6141B09050342)资助项目。
关键词 强化学习 模仿学习 行为克隆 逆强化学习 主动学习 reinforcement learning imitation learning behavioral cloning inverse reinforcement learning active learning
  • 相关文献

同被引文献13

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部