因果强化学习综述

A Survey on Causal Reinforcement Learning.

作者信息

Zeng Yan, Cai Ruichu, Sun Fuchun, Huang Libo, Hao Zhifeng

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):5942-5962. doi: 10.1109/TNNLS.2024.3403001. Epub 2025 Apr 4.

DOI:10.1109/TNNLS.2024.3403001

Abstract

While reinforcement learning (RL) achieves tremendous success in sequential decision-making problems of many domains, it still faces key challenges of data inefficiency and the lack of interpretability. Interestingly, many researchers have leveraged insights from the causality literature recently, bringing forth flourishing works to unify the merits of causality and address well the challenges from RL. As such, it is of great necessity and significance to collate these causal RL (CRL) works, offer a review of CRL methods, and investigate the potential functionality from causality toward RL. In particular, we divide the existing CRL approaches into two categories according to whether their causality-based information is given in advance or not. We further analyze each category in terms of the formalization of different models, ranging from the Markov decision process (MDP), partially observed MDP (POMDP), multiarmed bandits (MABs), imitation learning (IL), and dynamic treatment regime (DTR). Each of them represents a distinct type of causal graphical illustration. Moreover, we summarize the evaluation matrices and open sources, while we discuss emerging applications, along with promising prospects for the future development of CRL.

摘要

虽然强化学习（RL）在许多领域的序列决策问题中取得了巨大成功，但它仍然面临数据低效和缺乏可解释性等关键挑战。有趣的是，最近许多研究人员借鉴了因果关系文献中的见解，产生了大量成果，以统一因果关系的优点并很好地应对强化学习带来的挑战。因此，整理这些因果强化学习（CRL）的研究成果、对CRL方法进行综述，并研究因果关系对强化学习的潜在作用，具有极大的必要性和重要意义。具体而言，我们根据基于因果关系的信息是否预先给出，将现有的CRL方法分为两类。我们进一步从不同模型的形式化角度分析每一类方法，这些模型包括马尔可夫决策过程（MDP）、部分可观测MDP（POMDP）、多臂老虎机（MABs）、模仿学习（IL）和动态治疗方案（DTR）。它们中的每一个都代表一种独特的因果关系图形表示。此外，我们总结了评估矩阵和开源资源，同时讨论了新兴应用以及CRL未来发展的前景。

相似文献

A Survey on Causal Reinforcement Learning.因果强化学习综述

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):5942-5962. doi: 10.1109/TNNLS.2024.3403001. Epub 2025 Apr 4.

Incorporating causal factors into reinforcement learning for dynamic treatment regimes in HIV.将因果因素纳入 HIV 动态治疗方案的强化学习中。

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):60. doi: 10.1186/s12911-019-0755-6.

Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units.重症监护病房中智能机械通气和镇静药物剂量的逆强化学习。

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):57. doi: 10.1186/s12911-019-0763-6.

Optimizing Attention and Cognitive Control Costs Using Temporally Layered Architectures.利用时间分层架构优化注意力和认知控制成本。

Neural Comput. 2024 Nov 19;36(12):2734-2763. doi: 10.1162/neco_a_01718.

Reinforcement Learning Methods in Public Health.公共卫生中的强化学习方法

Clin Ther. 2022 Jan;44(1):139-154. doi: 10.1016/j.clinthera.2021.11.002. Epub 2022 Jan 19.

GFANC-RL: Reinforcement Learning-based Generative Fixed-filter Active Noise Control.基于强化学习的生成式固定滤波器有源噪声控制。

Neural Netw. 2024 Dec;180:106687. doi: 10.1016/j.neunet.2024.106687. Epub 2024 Sep 5.

A delay-robust method for enhanced real-time reinforcement learning.一种用于增强实时强化学习的延迟鲁棒方法。

Neural Netw. 2025 Jan;181:106769. doi: 10.1016/j.neunet.2024.106769. Epub 2024 Oct 1.

HMM for discovering decision-making dynamics using reinforcement learning experiments.用于通过强化学习实验发现决策动态的隐马尔可夫模型。

Biostatistics. 2024 Dec 31;26(1). doi: 10.1093/biostatistics/kxae033.

Salience Interest Option: Temporal abstraction with salience interest functions.凸显兴趣选项：使用凸显兴趣函数进行时间抽象。

Neural Netw. 2024 Aug;176:106342. doi: 10.1016/j.neunet.2024.106342. Epub 2024 Apr 25.

Systematic literature review on reinforcement learning in non-communicable disease interventions.系统文献综述：非传染性疾病干预措施中的强化学习。

Artif Intell Med. 2024 Aug;154:102901. doi: 10.1016/j.artmed.2024.102901. Epub 2024 Jun 4.

引用本文的文献

Efficient structure learning of gene regulatory networks with Bayesian active learning.基于贝叶斯主动学习的基因调控网络高效结构学习

BMC Bioinformatics. 2025 Jun 3;26(1):150. doi: 10.1186/s12859-025-06149-6.

Causality, Machine Learning, and Feature Selection: A Survey.因果关系、机器学习与特征选择：一项综述。

Sensors (Basel). 2025 Apr 9;25(8):2373. doi: 10.3390/s25082373.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

因果强化学习综述

A Survey on Causal Reinforcement Learning.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献