• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于从观察数据中估计最优动态治疗规则的Q学习法。

Q-learning for estimating optimal dynamic treatment rules from observational data.

作者信息

Moodie Erica E M, Chakraborty Bibhas, Kramer Michael S

机构信息

McGill University, Department of Epidemiology, Biostatistics, and Occupational Health, QC, Canada H3A 1A2.

出版信息

Can J Stat. 2012 Dec 1;40(4):629-645. doi: 10.1002/cjs.11162. Epub 2012 Nov 7.

DOI:10.1002/cjs.11162
PMID:23355757
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3551601/
Abstract

The area of dynamic treatment regimes (DTR) aims to make inference about adaptive, multistage decision-making in clinical practice. A DTR is a set of decision rules, one per interval of treatment, where each decision is a function of treatment and covariate history that returns a recommended treatment. Q-learning is a popular method from the reinforcement learning literature that has recently been applied to estimate DTRs. While, in principle, Q-learning can be used for both randomized and observational data, the focus in the literature thus far has been exclusively on the randomized treatment setting. We extend the method to incorporate measured confounding covariates, using direct adjustment and a variety of propensity score approaches. The methods are examined under various settings including non-regular scenarios. We illustrate the methods in examining the effect of breastfeeding on vocabulary testing, based on data from the Promotion of Breastfeeding Intervention Trial.

摘要

动态治疗方案(DTR)领域旨在对临床实践中的适应性多阶段决策进行推断。一个DTR是一组决策规则,每个治疗间隔对应一个规则,其中每个决策都是治疗和协变量历史的函数,返回推荐的治疗方案。Q学习是强化学习文献中的一种常用方法,最近已被应用于估计DTR。虽然原则上Q学习可用于随机数据和观察数据,但迄今为止文献中的重点一直完全放在随机治疗设置上。我们扩展了该方法,使用直接调整和各种倾向得分方法纳入测量到的混杂协变量。这些方法在包括非正则情形在内的各种设置下进行了检验。我们根据母乳喂养促进干预试验的数据,举例说明了这些方法在检验母乳喂养对词汇测试影响方面的应用。

相似文献

1
Q-learning for estimating optimal dynamic treatment rules from observational data.用于从观察数据中估计最优动态治疗规则的Q学习法。
Can J Stat. 2012 Dec 1;40(4):629-645. doi: 10.1002/cjs.11162. Epub 2012 Nov 7.
2
Imputation-based Q-learning for optimizing dynamic treatment regimes with right-censored survival outcome.基于插补的 Q 学习优化右删失生存结局的动态治疗方案。
Biometrics. 2023 Dec;79(4):3676-3689. doi: 10.1111/biom.13872. Epub 2023 May 17.
3
New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes.用于估计最优动态治疗方案的新统计学习方法。
J Am Stat Assoc. 2015;110(510):583-598. doi: 10.1080/01621459.2014.937488.
4
Bayesian inference for optimal dynamic treatment regimes in practice.贝叶斯推断在实践中最优动态治疗方案的应用。
Int J Biostat. 2023 May 17;19(2):309-331. doi: 10.1515/ijb-2022-0073. eCollection 2023 Nov 1.
5
Estimating tree-based dynamic treatment regimes using observational data with restricted treatment sequences.利用限制治疗序列的观测数据估计基于树的动态治疗规则。
Biometrics. 2023 Sep;79(3):2260-2271. doi: 10.1111/biom.13754. Epub 2022 Oct 9.
6
Use of personalized Dynamic Treatment Regimes (DTRs) and Sequential Multiple Assignment Randomized Trials (SMARTs) in mental health studies.个性化动态治疗方案(DTRs)和序贯多重分配随机试验(SMARTs)在心理健康研究中的应用。
Shanghai Arch Psychiatry. 2014 Dec;26(6):376-83. doi: 10.11919/j.issn.1002-0829.214172.
7
Multiobjective tree-based reinforcement learning for estimating tolerant dynamic treatment regimes.基于多目标树的强化学习估计宽容动态治疗方案。
Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujad017.
8
Step-adjusted tree-based reinforcement learning for evaluating nested dynamic treatment regimes using test-and-treat observational data.基于树的分步调整强化学习在使用测试和治疗观察数据评估嵌套动态治疗方案中的应用。
Stat Med. 2021 Nov 30;40(27):6164-6177. doi: 10.1002/sim.9177. Epub 2021 Sep 7.
9
Reward ignorant modeling of dynamic treatment regimes.奖励动态治疗方案的无知建模。
Biom J. 2018 Sep;60(5):991-1002. doi: 10.1002/bimj.201700322. Epub 2018 May 30.
10
Inference about the expected performance of a data-driven dynamic treatment regime.关于数据驱动的动态治疗方案预期性能的推断。
Clin Trials. 2014 Aug;11(4):408-417. doi: 10.1177/1740774514537727. Epub 2014 Jun 12.

引用本文的文献

1
Optimising dynamic treatment regimens using sequential multiple assignment randomised trials data with missing data.利用带有缺失数据的序贯多组分配随机试验数据优化动态治疗方案
BMC Med Res Methodol. 2025 Jul 1;25(1):162. doi: 10.1186/s12874-025-02595-1.
2
Deep learning-based ranking method for subgroup and predictive biomarker identification in patients.基于深度学习的患者亚组及预测生物标志物识别排序方法。
Commun Med (Lond). 2025 Jun 10;5(1):221. doi: 10.1038/s43856-025-00946-z.
3
Dynamic Treatment Regimes on Dyadic Networks.二元网络上的动态治疗方案
Stat Med. 2024 Dec 30;43(30):5944-5967. doi: 10.1002/sim.10278. Epub 2024 Nov 28.
4
Learning optimal dynamic treatment regimes from longitudinal data.从纵向数据中学习最优动态治疗方案。
Am J Epidemiol. 2024 Dec 2;193(12):1768-1775. doi: 10.1093/aje/kwae122.
5
Integrating randomized and observational studies to estimate optimal dynamic treatment regimes.整合随机和观察性研究来估计最优动态治疗方案。
Biometrics. 2024 Mar 27;80(2). doi: 10.1093/biomtc/ujae046.
6
When the Ends do not Justify the Means: Learning Who is Predicted to Have Harmful Indirect Effects.当目的无法证明手段的合理性时:了解谁被预测会产生有害的间接影响。
J R Stat Soc Ser A Stat Soc. 2022 Dec;185(Suppl 2):S573-S589. doi: 10.1111/rssa.12951. Epub 2022 Nov 8.
7
Estimating individualized treatment rules in longitudinal studies with covariate-driven observation times.在具有协变量驱动观测时间的纵向研究中估计个体化治疗规则。
Stat Methods Med Res. 2023 May;32(5):868-884. doi: 10.1177/09622802231158733. Epub 2023 Mar 16.
8
Optimizing Health Coaching for Patients With Type 2 Diabetes Using Machine Learning: Model Development and Validation Study.使用机器学习为2型糖尿病患者优化健康指导:模型开发与验证研究
JMIR Form Res. 2022 Sep 13;6(9):e37838. doi: 10.2196/37838.
9
Scaling Interventions to Manage Chronic Disease: Innovative Methods at the Intersection of Health Policy Research and Implementation Science.慢性病管理干预措施的扩展:卫生政策研究与实施科学交叉点的创新方法。
Prev Sci. 2024 Apr;25(Suppl 1):96-108. doi: 10.1007/s11121-022-01427-8. Epub 2022 Sep 1.
10
Deep reinforcement learning for personalized treatment recommendation.深度强化学习在个性化治疗推荐中的应用。
Stat Med. 2022 Sep 10;41(20):4034-4056. doi: 10.1002/sim.9491. Epub 2022 Jun 18.

本文引用的文献

1
Penalized Q-Learning for Dynamic Treatment Regimens.用于动态治疗方案的惩罚性Q学习
Stat Sin. 2015 Jul;25(3):901-920. doi: 10.5705/ss.2012.364.
2
Inference for optimal dynamic treatment regimes using an adaptive m-out-of-n bootstrap scheme.使用自适应n选m自助法对最优动态治疗方案进行推断。
Biometrics. 2013 Sep;69(3):714-23. doi: 10.1111/biom.12052. Epub 2013 Jul 11.
3
Informing sequential clinical decision-making through reinforcement learning: an empirical study.通过强化学习为序贯临床决策提供信息:一项实证研究。
Mach Learn. 2011 Jul 1;84(1-2):109-136. doi: 10.1007/s10994-010-5229-0.
4
Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer.非小细胞肺癌临床试验的强化学习策略
Biometrics. 2011 Dec;67(4):1422-33. doi: 10.1111/j.1541-0420.2011.01572.x. Epub 2011 Mar 8.
5
Dynamic treatment regimes for managing chronic health conditions: a statistical perspective.动态治疗方案管理慢性健康状况:统计视角。
Am J Public Health. 2011 Jan;101(1):40-5. doi: 10.2105/AJPH.2010.198937. Epub 2010 Nov 18.
6
Optimal dynamic regimes: presenting a case for predictive inference.最优动态机制:为预测性推理提供实例
Int J Biostat. 2010 Mar 3;6(2):Article 10. doi: 10.2202/1557-4679.1204.
7
Estimating Optimal Dynamic Regimes: Correcting Bias under the Null: [Optimal dynamic regimes: bias correction].估计最优动态策略:在原假设下校正偏差:[最优动态策略:偏差校正]
Scand Stat Theory Appl. 2009 Sep 22;37(1):126-146. doi: 10.1111/j.1467-9469.2009.00661.x.
8
Regret-regression for optimal dynamic treatment regimes.用于优化动态治疗方案的后悔回归法。
Biometrics. 2010 Dec;66(4):1192-201. doi: 10.1111/j.1541-0420.2009.01368.x.
9
Reinforcement learning design for cancer clinical trials.强化学习在癌症临床试验中的设计。
Stat Med. 2009 Nov 20;28(26):3294-315. doi: 10.1002/sim.3720.
10
Inference for non-regular parameters in optimal dynamic treatment regimes.最优动态治疗方案中非正则参数的推断。
Stat Methods Med Res. 2010 Jun;19(3):317-43. doi: 10.1177/0962280209105013. Epub 2009 Jul 16.