摘要
基于因果建模的强化学习技术在智能控制领域越来越受欢迎.因果技术可以挖掘控制系统中的结构性因果知识,并提供了一个可解释的框架,允许人为对系统进行干预并对反馈进行分析.量化干预的效果使智能体能够在复杂的情况下(例如存在混杂因子或非平稳环境)评估策略的性能,提升算法的泛化性.本文旨在探讨基于因果建模的强化学习控制技术(以下简称因果强化学习)的最新进展,阐明其与控制系统各个模块的联系.首先介绍了强化学习的基本概念和经典算法,并讨论强化学习算法在变量因果关系解释和迁移场景下策略泛化性方面存在的缺陷.其次,回顾了因果理论的研究方向,主要包括因果效应估计和因果关系发现,这些内容为解决强化学习的缺陷提供了可行方案.接下来,阐释了如何利用因果理论改善强化学习系统的控制与决策,总结了因果强化学习的四类研究方向及进展,并整理了实际应用场景.最后,对全文进行总结,指出了因果强化学习的缺点和待解决问题,并展望了未来的研究方向.
Causality research has shown its potential and advantages in the reinforcement learning community.Beyond the inherent capability of inferring causal structure from data,causality provides an explainable toolset for investigating how a system would react to an intervention.Quantifying the effects of interventions allows actionable decisions to be made while maintaining robustness in the complex system(e.g.,in the presence of confounders or under nonstationary environments).This paper explores how causality can be incorporated into different aspects of control systems and introduces recent advances in causal reinforcement learning.First,the concept and algorithms of reinforcement learning are introduced,and two main challenges,e.g.,lack of causal explanation of observation variables and hard to transfer in transferable environments,are discussed.Second,the lines of research within causality are reviewed,including causal effect estimation and causal discovery,which provide potential solutions to address the aforementioned challenges.After that,how to embed causality in reinforcement learning systems is introduced.Four kinds of research advances in causal reinforcement learning are summarized and analyzed,followed by real-world applications.Finally,this paper summarizes and presents opening problems and future work prospects.
作者
孙悦雯
柳文章
孙长银
SUN Yue-Wen;LIU Wen-Zhang;SUN Chang-Yin(School of Automation,Southeast University,Nanjing 210096;School of Artificial Intelligence,Anhui University,Hefei 230601;Engineering Research Center of Autonomous Unmanned System Technology,Ministry of Education,Hefei 230601;Anhui Unmanned System and Intelligent Technology Engineering Research Center,Hefei 230601)
出处
《自动化学报》
EI
CAS
CSCD
北大核心
2023年第3期661-677,共17页
Acta Automatica Sinica
基金
国家自然科学基金(62236002,61921004)资助。
关键词
强化学习控制
因果发现
因果推理
迁移学习
表示学习
Reinforcement learning control
causal discovery
causal inference
transfer learning
representation learning