Suppr超能文献

因果强化学习综述

A Survey on Causal Reinforcement Learning.

作者信息

Zeng Yan, Cai Ruichu, Sun Fuchun, Huang Libo, Hao Zhifeng

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):5942-5962. doi: 10.1109/TNNLS.2024.3403001. Epub 2025 Apr 4.

Abstract

While reinforcement learning (RL) achieves tremendous success in sequential decision-making problems of many domains, it still faces key challenges of data inefficiency and the lack of interpretability. Interestingly, many researchers have leveraged insights from the causality literature recently, bringing forth flourishing works to unify the merits of causality and address well the challenges from RL. As such, it is of great necessity and significance to collate these causal RL (CRL) works, offer a review of CRL methods, and investigate the potential functionality from causality toward RL. In particular, we divide the existing CRL approaches into two categories according to whether their causality-based information is given in advance or not. We further analyze each category in terms of the formalization of different models, ranging from the Markov decision process (MDP), partially observed MDP (POMDP), multiarmed bandits (MABs), imitation learning (IL), and dynamic treatment regime (DTR). Each of them represents a distinct type of causal graphical illustration. Moreover, we summarize the evaluation matrices and open sources, while we discuss emerging applications, along with promising prospects for the future development of CRL.

摘要

虽然强化学习(RL)在许多领域的序列决策问题中取得了巨大成功,但它仍然面临数据低效和缺乏可解释性等关键挑战。有趣的是,最近许多研究人员借鉴了因果关系文献中的见解,产生了大量成果,以统一因果关系的优点并很好地应对强化学习带来的挑战。因此,整理这些因果强化学习(CRL)的研究成果、对CRL方法进行综述,并研究因果关系对强化学习的潜在作用,具有极大的必要性和重要意义。具体而言,我们根据基于因果关系的信息是否预先给出,将现有的CRL方法分为两类。我们进一步从不同模型的形式化角度分析每一类方法,这些模型包括马尔可夫决策过程(MDP)、部分可观测MDP(POMDP)、多臂老虎机(MABs)、模仿学习(IL)和动态治疗方案(DTR)。它们中的每一个都代表一种独特的因果关系图形表示。此外,我们总结了评估矩阵和开源资源,同时讨论了新兴应用以及CRL未来发展的前景。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验