基于树的强化学习用于估计最优动态治疗方案

TREE-BASED REINFORCEMENT LEARNING FOR ESTIMATING OPTIMAL DYNAMIC TREATMENT REGIMES.

作者信息

Tao Yebin, Wang Lu, Almirall Daniel

机构信息

Department of Biostatistics University of Michigan Ann Arbor, Michigan 48109 USA.

Institute for Social Research University of Michigan Ann Arbor, Michigan 48104 USA.

出版信息

Ann Appl Stat. 2018 Sep;12(3):1914-1938. doi: 10.1214/18-AOAS1137. Epub 2018 Sep 11.

DOI:10.1214/18-AOAS1137

PMID:30984321

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6457899/

Abstract

Dynamic treatment regimes (DTRs) are sequences of treatment decision rules, in which treatment may be adapted over time in response to the changing course of an individual. Motivated by the substance use disorder (SUD) study, we propose a tree-based reinforcement learning (T-RL) method to directly estimate optimal DTRs in a multi-stage multi-treatment setting. At each stage, T-RL builds an unsupervised decision tree that directly handles the problem of optimization with multiple treatment comparisons, through a purity measure constructed with augmented inverse probability weighted estimators. For the multiple stages, the algorithm is implemented recursively using backward induction. By combining semiparametric regression with flexible tree-based learning, T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs, as shown in the simulation studies. With the proposed method, we identify dynamic SUD treatment regimes for adolescents.

摘要

动态治疗方案（DTRs）是一系列治疗决策规则，其中治疗可根据个体病情的变化随时间进行调整。受物质使用障碍（SUD）研究的启发，我们提出了一种基于树的强化学习（T-RL）方法，以在多阶段多治疗环境中直接估计最优动态治疗方案。在每个阶段，T-RL构建一个无监督决策树，该决策树通过使用增强逆概率加权估计器构建的纯度度量，直接处理多重治疗比较的优化问题。对于多个阶段，该算法使用反向归纳法递归实现。如模拟研究所示，通过将半参数回归与灵活的基于树的学习相结合，T-RL在识别最优动态治疗方案方面具有稳健性、高效性且易于解释。使用所提出的方法，我们确定了青少年的动态物质使用障碍治疗方案。

相似文献

TREE-BASED REINFORCEMENT LEARNING FOR ESTIMATING OPTIMAL DYNAMIC TREATMENT REGIMES.

Ann Appl Stat. 2018 Sep;12(3):1914-1938. doi: 10.1214/18-AOAS1137. Epub 2018 Sep 11.

Multiobjective tree-based reinforcement learning for estimating tolerant dynamic treatment regimes.

Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujad017.

Adaptive contrast weighted learning for multi-stage multi-treatment decision-making.

Biometrics. 2017 Mar;73(1):145-155. doi: 10.1111/biom.12539. Epub 2016 May 23.

Step-adjusted tree-based reinforcement learning for evaluating nested dynamic treatment regimes using test-and-treat observational data.

Stat Med. 2021 Nov 30;40(27):6164-6177. doi: 10.1002/sim.9177. Epub 2021 Sep 7.

Restricted sub-tree learning to estimate an optimal dynamic treatment regime using observational data.

Stat Med. 2021 Nov 20;40(26):5796-5812. doi: 10.1002/sim.9155. Epub 2021 Aug 2.

New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes.

J Am Stat Assoc. 2015;110(510):583-598. doi: 10.1080/01621459.2014.937488.

C-learning: A new classification framework to estimate optimal dynamic treatment regimes.

Biometrics. 2018 Sep;74(3):891-899. doi: 10.1111/biom.12836. Epub 2017 Dec 11.

Estimating tree-based dynamic treatment regimes using observational data with restricted treatment sequences.

Biometrics. 2023 Sep;79(3):2260-2271. doi: 10.1111/biom.13754. Epub 2022 Oct 9.

Bayesian inference for optimal dynamic treatment regimes in practice.

Int J Biostat. 2023 May 17;19(2):309-331. doi: 10.1515/ijb-2022-0073. eCollection 2023 Nov 1.

Penalized Spline-Involved Tree-based (PenSIT) Learning for estimating an optimal dynamic treatment regime using observational data.

Stat Methods Med Res. 2022 Dec;31(12):2338-2351. doi: 10.1177/09622802221122397. Epub 2022 Oct 3.

引用本文的文献

Reinforcement Learning and Its Clinical Applications Within Healthcare: A Systematic Review of Precision Medicine and Dynamic Treatment Regimes.

Healthcare (Basel). 2025 Jul 19;13(14):1752. doi: 10.3390/healthcare13141752.

Simultaneous Feature Selection for Optimal Dynamic Treatment Regimens.

Stat Med. 2025 Jul;44(15-17):e70169. doi: 10.1002/sim.70169.

Energy landscape analysis and time-series clustering analysis of patient state multistability related to rheumatoid arthritis drug treatment: The KURAMA cohort study.

PLoS One. 2024 May 6;19(5):e0302308. doi: 10.1371/journal.pone.0302308. eCollection 2024.

Penalized robust learning for optimal treatment regimes with heterogeneous individualized treatment effects.

J Appl Stat. 2023 Feb 20;51(6):1151-1170. doi: 10.1080/02664763.2023.2180167. eCollection 2024.

Multiobjective tree-based reinforcement learning for estimating tolerant dynamic treatment regimes.

Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujad017.

Ranking tailoring variables for constructing individualized treatment rules: an application to schizophrenia.

J R Stat Soc Ser C Appl Stat. 2022 Mar;71(2):309-330. doi: 10.1111/rssc.12533. Epub 2022 Mar 20.

Transfer learning of individualized treatment rules from experimental to real-world data.

J Comput Graph Stat. 2023;32(3):1036-1045. doi: 10.1080/10618600.2022.2141752. Epub 2022 Nov 30.

A general framework for subgroup detection via one-step value difference estimation.

Biometrics. 2023 Sep;79(3):2116-2126. doi: 10.1111/biom.13711. Epub 2022 Aug 2.

Optimal dynamic treatment regime estimation using information extraction from unstructured clinical text.

Biom J. 2022 Apr;64(4):805-817. doi: 10.1002/bimj.202100077. Epub 2022 Feb 3.

Step-adjusted tree-based reinforcement learning for evaluating nested dynamic treatment regimes using test-and-treat observational data.

Stat Med. 2021 Nov 30;40(27):6164-6177. doi: 10.1002/sim.9177. Epub 2021 Sep 7.

本文引用的文献

Interpretable Dynamic Treatment Regimes.

J Am Stat Assoc. 2018;113(524):1541-1549. doi: 10.1080/01621459.2017.1345743. Epub 2018 Nov 14.

Residual Weighted Learning for Estimating Individualized Treatment Rules.

J Am Stat Assoc. 2017;112(517):169-187. doi: 10.1080/01621459.2015.1093947. Epub 2017 May 3.

Adaptive contrast weighted learning for multi-stage multi-treatment decision-making.

Biometrics. 2017 Mar;73(1):145-155. doi: 10.1111/biom.12539. Epub 2016 May 23.

Reinforcement Learning Trees.

J Am Stat Assoc. 2015;110(512):1770-1784. doi: 10.1080/01621459.2015.1036994. Epub 2015 Apr 16.

Tree-based methods for individualized treatment regimes.

Biometrika. 2015;102(3):501-514. doi: 10.1093/biomet/asv028. Epub 2015 Jul 15.

New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes.

J Am Stat Assoc. 2015;110(510):583-598. doi: 10.1080/01621459.2014.937488.

Using decision lists to construct interpretable and parsimonious treatment regimes.

Biometrics. 2015 Dec;71(4):895-904. doi: 10.1111/biom.12354. Epub 2015 Jul 20.

Optimization of multi-stage dynamic treatment regimes utilizing accumulated data.

Stat Med. 2015 Nov 20;34(26):3424-43. doi: 10.1002/sim.6558. Epub 2015 Jun 21.

Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes.

Stat Sci. 2014 Nov;29(4):640-661. doi: 10.1214/13-STS450.

Dynamic Treatment Regimes.

Annu Rev Stat Appl. 2014;1:447-464. doi: 10.1146/annurev-statistics-022513-115553.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于树的强化学习用于估计最优动态治疗方案

TREE-BASED REINFORCEMENT LEARNING FOR ESTIMATING OPTIMAL DYNAMIC TREATMENT REGIMES.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献