• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

动态治疗方案的强化学习基准

: A Reinforcement Learning Benchmark for Dynamic Treatment Regimes.

作者信息

Hargrave Mason, Spaeth Alex, Grosenick Logan

机构信息

Center for Studies in Physics and Biology, The Rockefeller University, New York, NY, USA.

Dept. of Electrical and Computer Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA.

出版信息

Adv Neural Inf Process Syst. 2024;37:130536-130568.

PMID:40453101
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12124763/
Abstract

Healthcare applications pose significant challenges to existing reinforcement learning (RL) methods due to implementation risks, limited data availability, short treatment episodes, sparse rewards, partial observations, and heterogeneous treatment effects. Despite significant interest in using RL to generate dynamic treatment regimes for longitudinal patient care scenarios, no standardized benchmark has yet been developed. To fill this need we introduce (), a benchmark designed to mimic the challenges associated with applying RL to longitudinal healthcare settings. We leverage this benchmark to test five state-of-the-art offline RL models as well as five common off-policy evaluation (OPE) techniques. Our results suggest that while offline RL may be capable of improving upon existing standards of care given sufficient data, its applicability does not appear to extend to the moderate to low data regimes typical of current healthcare settings. Additionally, we demonstrate that several OPE techniques standard in the the medical RL literature fail to perform adequately on our benchmark. These results suggest that the performance of RL models in dynamic treatment regimes may be difficult to meaningfully evaluate using current OPE methods, indicating that RL for this application domain may still be in its early stages. We hope that these results along with the benchmark will facilitate better comparison of existing methods and inspire further research into techniques that increase the practical applicability of medical RL.

摘要

由于实施风险、数据可用性有限、治疗周期短、奖励稀疏、部分观察结果以及异质治疗效果等因素,医疗保健应用给现有的强化学习(RL)方法带来了重大挑战。尽管人们对使用强化学习为纵向患者护理场景生成动态治疗方案有着浓厚兴趣,但尚未开发出标准化的基准测试。为满足这一需求,我们引入了(),这是一个旨在模拟将强化学习应用于纵向医疗保健环境所面临挑战的基准测试。我们利用这个基准测试来测试五个最先进的离线强化学习模型以及五种常见的离策略评估(OPE)技术。我们的结果表明,虽然在有足够数据的情况下,离线强化学习可能能够改进现有的护理标准,但其适用性似乎并未扩展到当前医疗保健环境中典型的中低数据量情况。此外,我们证明了医学强化学习文献中几种标准的OPE技术在我们的基准测试中表现不佳。这些结果表明,使用当前的OPE方法可能难以对动态治疗方案中的强化学习模型性能进行有意义的评估,这表明该应用领域的强化学习可能仍处于早期阶段。我们希望这些结果以及该基准测试将有助于更好地比较现有方法,并激发对提高医学强化学习实际适用性的技术的进一步研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/61b879e3b3d9/nihms-2043160-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/9521492e6264/nihms-2043160-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/79b74ff306bd/nihms-2043160-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/5477052110a5/nihms-2043160-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/ee61b4dcf873/nihms-2043160-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/d0c00a0aa725/nihms-2043160-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/624899b15e37/nihms-2043160-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/a3a3119c0a6f/nihms-2043160-f0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/7b54db7ecee0/nihms-2043160-f0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/7254dff70163/nihms-2043160-f0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/24c55dbb935f/nihms-2043160-f0014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/e79417bfba85/nihms-2043160-f0015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/0aae77f5768c/nihms-2043160-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/e912c4d703d1/nihms-2043160-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/100ab5e05113/nihms-2043160-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/61b879e3b3d9/nihms-2043160-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/9521492e6264/nihms-2043160-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/79b74ff306bd/nihms-2043160-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/5477052110a5/nihms-2043160-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/ee61b4dcf873/nihms-2043160-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/d0c00a0aa725/nihms-2043160-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/624899b15e37/nihms-2043160-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/a3a3119c0a6f/nihms-2043160-f0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/7b54db7ecee0/nihms-2043160-f0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/7254dff70163/nihms-2043160-f0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/24c55dbb935f/nihms-2043160-f0014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/e79417bfba85/nihms-2043160-f0015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/0aae77f5768c/nihms-2043160-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/e912c4d703d1/nihms-2043160-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/100ab5e05113/nihms-2043160-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/79e7/12124763/61b879e3b3d9/nihms-2043160-f0004.jpg

相似文献

1
: A Reinforcement Learning Benchmark for Dynamic Treatment Regimes.动态治疗方案的强化学习基准
Adv Neural Inf Process Syst. 2024;37:130536-130568.
2
Reinforcement learning for intensive care medicine: actionable clinical insights from novel approaches to reward shaping and off-policy model evaluation.重症医学中的强化学习:奖励塑造和离策略模型评估新方法带来的可操作临床见解。
Intensive Care Med Exp. 2024 Mar 25;12(1):32. doi: 10.1186/s40635-024-00614-x.
3
Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings.离线强化学习的模型选择:医疗环境中的实际考量
Proc Mach Learn Res. 2021 Aug;149:2-35.
4
Efficient Offline Reinforcement Learning With Relaxed Conservatism.基于松弛保守主义的高效离线强化学习
IEEE Trans Pattern Anal Mach Intell. 2024 Aug;46(8):5260-5272. doi: 10.1109/TPAMI.2024.3364844. Epub 2024 Jul 2.
5
A review of reinforcement learning for natural language processing and applications in healthcare.强化学习在自然语言处理中的综述及在医疗保健中的应用。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2379-2393. doi: 10.1093/jamia/ocae215.
6
RL-MLZerD: Multimeric protein docking using reinforcement learning.RL-MLZerD:使用强化学习的多聚体蛋白质对接
Front Mol Biosci. 2022 Aug 26;9:969394. doi: 10.3389/fmolb.2022.969394. eCollection 2022.
7
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
8
Improving Offline Reinforcement Learning With In-Sample Advantage Regularization for Robot Manipulation.通过样本内优势正则化改进用于机器人操作的离线强化学习
IEEE Trans Neural Netw Learn Syst. 2024 Sep 20;PP. doi: 10.1109/TNNLS.2024.3443102.
9
Transatlantic transferability of a new reinforcement learning model for optimizing haemodynamic treatment for critically ill patients with sepsis.用于优化脓毒症危重症患者血液动力学治疗的新型强化学习模型的大西洋转移能力。
Artif Intell Med. 2021 Feb;112:102003. doi: 10.1016/j.artmed.2020.102003. Epub 2020 Dec 15.
10
Bi-DexHands: Towards Human-Level Bimanual Dexterous Manipulation.双臂双自由度灵巧操作机器人:迈向人类级别的双手灵巧操作
IEEE Trans Pattern Anal Mach Intell. 2024 May;46(5):2804-2818. doi: 10.1109/TPAMI.2023.3339515. Epub 2024 Apr 3.

本文引用的文献

1
Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care.修剪通往可靠政策之路:重症监护的多目标深度 Q 学习方法。
IEEE J Biomed Health Inform. 2024 Oct;28(10):6268-6279. doi: 10.1109/JBHI.2024.3415115. Epub 2024 Oct 3.
2
Continuous-Time Decision Transformer for Healthcare Applications.用于医疗保健应用的连续时间决策变压器
Proc Mach Learn Res. 2023 Apr;206:6245-6262.
3
Dimensional and Categorical Solutions to Parsing Depression Heterogeneity in a Large Single-Site Sample.从大样本单站点研究中解析抑郁异质性的维度和分类解决方案。
Biol Psychiatry. 2024 Sep 15;96(6):422-434. doi: 10.1016/j.biopsych.2024.01.012. Epub 2024 Jan 26.
4
Towards Real-World Applications of Personalized Anesthesia Using Policy Constraint Q Learning for Propofol Infusion Control.基于策略约束Q学习的丙泊酚输注控制在个性化麻醉实际应用中的探索
IEEE J Biomed Health Inform. 2023 Oct 2;PP. doi: 10.1109/JBHI.2023.3321099.
5
Using artificial intelligence to learn optimal regimen plan for Alzheimer's disease.利用人工智能学习阿尔茨海默病的最佳治疗方案。
J Am Med Inform Assoc. 2023 Sep 25;30(10):1645-1656. doi: 10.1093/jamia/ocad135.
6
A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems.离线强化学习综述:分类、回顾与开放问题
IEEE Trans Neural Netw Learn Syst. 2024 Aug;35(8):10237-10257. doi: 10.1109/TNNLS.2023.3250269. Epub 2024 Aug 5.
7
Is personalized treatment selection a promising avenue in bpd research? A meta-regression estimating treatment effect heterogeneity in RCTs of BPD.个性化治疗选择在边缘型人格障碍(BPD)研究中是一条有前景的途径吗?一项对BPD随机对照试验中治疗效果异质性进行估计的元回归分析。
J Consult Clin Psychol. 2023 Mar;91(3):165-170. doi: 10.1037/ccp0000803. Epub 2023 Feb 16.
8
Sequential, Multiple Assignment, Randomized Trial Designs.序贯、多重分配、随机试验设计
JAMA. 2023 Jan 24;329(4):336-337. doi: 10.1001/jama.2022.24324.
9
Adversarial reinforcement learning for dynamic treatment regimes.用于动态治疗方案的对抗强化学习。
J Biomed Inform. 2023 Jan;137:104244. doi: 10.1016/j.jbi.2022.104244. Epub 2022 Nov 17.
10
Supervised Optimal Chemotherapy Regimen Based on Offline Reinforcement Learning.基于离线强化学习的监督最佳化疗方案。
IEEE J Biomed Health Inform. 2022 Sep;26(9):4763-4772. doi: 10.1109/JBHI.2022.3183854. Epub 2022 Sep 9.