基于因果建模的强化学习控制: 现状及展望

Translated title of the contribution: Causality in Reinforcement Learning Control: The State of the Art and Prospects

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Causality research has shown its potential and advantages in the reinforcement learning community. Beyond the inherent capability of inferring causal structure from data, causality provides an explainable toolset for investigating how a system would react to an intervention. Quantifying the effects of interventions allows actionable decisions to be made while maintaining robustness in the complex system (e.g., in the presence of confounders or under nonstationary environments). This paper explores how causality can be incorporated into different aspects of control systems and introduces recent advances in causal reinforcement learning. First, the concept and algorithms of reinforcement learning are introduced, and two main challenges, e.g., lack of causal explanation of observation variables and hard to transfer in transferable environments, are discussed. Second, the lines of research within causality are reviewed, including causal effect estimation and causal discovery, which provide potential solutions to address the aforementioned challenges. After that, how to embed causality in reinforcement learning systems is introduced. Four kinds of research advances in causal reinforcement learning are summarized and analyzed, followed by real-world applications. Finally, this paper summarizes and presents opening problems and future work prospects.

Translated title of the contributionCausality in Reinforcement Learning Control: The State of the Art and Prospects
Original languageChinese (Traditional)
Pages (from-to)661-677
Number of pages17
JournalZidonghua Xuebao/Acta Automatica Sinica
Volume49
Issue number3
DOIs
StatePublished - Mar 2023
Externally publishedYes

Fingerprint

Dive into the research topics of 'Causality in Reinforcement Learning Control: The State of the Art and Prospects'. Together they form a unique fingerprint.

Cite this