• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于动态治疗方案的对抗强化学习。

Adversarial reinforcement learning for dynamic treatment regimes.

作者信息

Sun Zhaohong, Dong Wei, Li Haomin, Huang Zhengxing

机构信息

Zhejiang University, Hangzhou, China.

Department of Cardiology, Chinese PLA General Hospital, Beijing, China.

出版信息

J Biomed Inform. 2023 Jan;137:104244. doi: 10.1016/j.jbi.2022.104244. Epub 2022 Nov 17.

DOI:10.1016/j.jbi.2022.104244
PMID:36402277
Abstract

Treatment recommendation, as a critical task of delivering effective interventions according to patient state and expected outcome, plays a vital role in precision medicine and healthcare management. As a well-suited tactic to learn optimal policies of recommender systems, reinforcement learning is promising to address the challenge of treatment recommendation. However, existing solutions mostly require frequent interactions between treatment recommender systems and clinical environment, which are expensive, time-consuming, and even infeasible in clinical practice. In this study, we present a novel model-based offline reinforcement learning approach to optimize a treatment policy by utilizing patient treatment trajectories in Electronic Health Records (EHRs). Specifically, a patient treatment trajectory simulator is firstly constructed based on the ground-truth trajectories in EHRs. Thereafter, the constructed simulator is utilized to model the online interactions between the treatment recommender system and clinical environment. In this way, the counterfactual trajectories can be generated. To alleviate the bias deriving from the ground-truth and the counterfactual trajectories, an adversarial network is incorporated into the proposed model, such that a large space of treatment actions can be explored with the scaled rewards. The proposed model is evaluated on a simulated dataset and a real-world dataset. The experimental results demonstrate that the proposed model is superior to other methods, giving rise to a new solution for dynamic treatment regimes and beyond.

摘要

治疗推荐作为根据患者状态和预期结果提供有效干预措施的关键任务,在精准医疗和医疗管理中发挥着至关重要的作用。作为一种适用于学习推荐系统最优策略的策略,强化学习有望应对治疗推荐的挑战。然而,现有的解决方案大多需要治疗推荐系统与临床环境之间频繁交互,这在临床实践中成本高昂、耗时且甚至不可行。在本研究中,我们提出了一种基于模型的新型离线强化学习方法,通过利用电子健康记录(EHR)中的患者治疗轨迹来优化治疗策略。具体而言,首先基于EHR中的真实轨迹构建患者治疗轨迹模拟器。此后,利用构建的模拟器对治疗推荐系统与临床环境之间的在线交互进行建模。通过这种方式,可以生成反事实轨迹。为了减轻真实轨迹和反事实轨迹产生的偏差,在所提出的模型中引入了一个对抗网络,以便能够利用缩放后的奖励探索大量的治疗行动空间。在所提出的模型在一个模拟数据集和一个真实世界数据集上进行了评估。实验结果表明,所提出的模型优于其他方法,为动态治疗方案及其他领域带来了新的解决方案。

相似文献

1
Adversarial reinforcement learning for dynamic treatment regimes.用于动态治疗方案的对抗强化学习。
J Biomed Inform. 2023 Jan;137:104244. doi: 10.1016/j.jbi.2022.104244. Epub 2022 Nov 17.
2
Plug-and-Play Model-Agnostic Counterfactual Policy Synthesis for Deep Reinforcement Learning-Based Recommendation.用于基于深度强化学习的推荐的即插即用模型无关反事实策略合成
IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):1044-1055. doi: 10.1109/TNNLS.2023.3329808. Epub 2025 Jan 7.
3
Adversarial Robustness of Deep Reinforcement Learning Based Dynamic Recommender Systems.基于深度强化学习的动态推荐系统的对抗鲁棒性
Front Big Data. 2022 May 3;5:822783. doi: 10.3389/fdata.2022.822783. eCollection 2022.
4
Endpoint prediction of heart failure using electronic health records.利用电子健康记录进行心力衰竭的终点预测。
J Biomed Inform. 2020 Sep;109:103518. doi: 10.1016/j.jbi.2020.103518. Epub 2020 Jul 25.
5
CIPL: Counterfactual Interactive Policy Learning to Eliminate Popularity Bias for Online Recommendation.CIPL:用于消除在线推荐中流行度偏差的反事实交互策略学习
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17123-17136. doi: 10.1109/TNNLS.2023.3299929. Epub 2024 Dec 2.
6
Black-box attacks on dynamic graphs via adversarial topology perturbations.通过对抗性拓扑扰动对动态图进行黑盒攻击。
Neural Netw. 2024 Mar;171:308-319. doi: 10.1016/j.neunet.2023.11.060. Epub 2023 Dec 1.
7
Deep reinforcement learning for personalized treatment recommendation.深度强化学习在个性化治疗推荐中的应用。
Stat Med. 2022 Sep 10;41(20):4034-4056. doi: 10.1002/sim.9491. Epub 2022 Jun 18.
8
Optimum trajectory learning in musculoskeletal systems with model predictive control and deep reinforcement learning.基于模型预测控制和深度强化学习的肌肉骨骼系统最优轨迹学习。
Biol Cybern. 2022 Dec;116(5-6):711-726. doi: 10.1007/s00422-022-00940-x. Epub 2022 Aug 11.
9
Applying Reinforcement Learning for Enhanced Cybersecurity against Adversarial Simulation.应用强化学习增强对抗性模拟的网络安全防御
Sensors (Basel). 2023 Mar 10;23(6):3000. doi: 10.3390/s23063000.
10
An Improved Approach towards Multi-Agent Pursuit-Evasion Game Decision-Making Using Deep Reinforcement Learning.一种使用深度强化学习改进多智能体追逃博弈决策的方法。
Entropy (Basel). 2021 Oct 29;23(11):1433. doi: 10.3390/e23111433.

引用本文的文献

1
TN5000: An Ultrasound Image Dataset for Thyroid Nodule Detection and Classification.TN5000:用于甲状腺结节检测与分类的超声图像数据集。
Sci Data. 2025 Aug 16;12(1):1437. doi: 10.1038/s41597-025-05757-4.
2
Reinforcement Learning and Its Clinical Applications Within Healthcare: A Systematic Review of Precision Medicine and Dynamic Treatment Regimes.强化学习及其在医疗保健领域的临床应用:精准医学与动态治疗方案的系统综述
Healthcare (Basel). 2025 Jul 19;13(14):1752. doi: 10.3390/healthcare13141752.
3
: A Reinforcement Learning Benchmark for Dynamic Treatment Regimes.
动态治疗方案的强化学习基准
Adv Neural Inf Process Syst. 2024;37:130536-130568.
4
Reinforcement Learning in Personalized Medicine: A Comprehensive Review of Treatment Optimization Strategies.个性化医疗中的强化学习:治疗优化策略的全面综述
Cureus. 2025 Apr 21;17(4):e82756. doi: 10.7759/cureus.82756. eCollection 2025 Apr.