Suppr超能文献

带删失数据的Q学习法

Q-LEARNING WITH CENSORED DATA.

作者信息

Goldberg Yair, Kosorok Michael R

机构信息

Department of Biostatistics, The University of North Carolina At Chapel Hill, Chapel Hill, NC 27599, U.S.A.

出版信息

Ann Stat. 2012 Feb 1;40(1):529-560. doi: 10.1214/12-AOS968.

Abstract

We develop methodology for a multistage-decision problem with flexible number of stages in which the rewards are survival times that are subject to censoring. We present a novel Q-learning algorithm that is adjusted for censored data and allows a flexible number of stages. We provide finite sample bounds on the generalization error of the policy learned by the algorithm, and show that when the optimal Q-function belongs to the approximation space, the expected survival time for policies obtained by the algorithm converges to that of the optimal policy. We simulate a multistage clinical trial with flexible number of stages and apply the proposed censored-Q-learning algorithm to find individualized treatment regimens. The methodology presented in this paper has implications in the design of personalized medicine trials in cancer and in other life-threatening diseases.

摘要

我们针对具有灵活阶段数的多阶段决策问题开发了一种方法,其中奖励是受删失影响的生存时间。我们提出了一种新颖的Q学习算法,该算法针对删失数据进行了调整,并允许灵活的阶段数。我们给出了算法学习到的策略的泛化误差的有限样本界,并表明当最优Q函数属于逼近空间时,算法得到的策略的预期生存时间收敛到最优策略的预期生存时间。我们模拟了一个具有灵活阶段数的多阶段临床试验,并应用所提出的删失Q学习算法来寻找个性化治疗方案。本文提出的方法对癌症和其他危及生命疾病的个性化医学试验设计具有重要意义。

相似文献

1
Q-LEARNING WITH CENSORED DATA.
Ann Stat. 2012 Feb 1;40(1):529-560. doi: 10.1214/12-AOS968.
2
Imputation-based Q-learning for optimizing dynamic treatment regimes with right-censored survival outcome.
Biometrics. 2023 Dec;79(4):3676-3689. doi: 10.1111/biom.13872. Epub 2023 May 17.
3
Tree based weighted learning for estimating individualized treatment rules with censored data.
Electron J Stat. 2017;11(2):3927-3953. doi: 10.1214/17-EJS1305. Epub 2017 Oct 18.
4
A Generalization Error for Q-Learning.
J Mach Learn Res. 2005 Jul;6:1073-1097.
5
Doubly Robust Learning for Estimating Individualized Treatment with Censored Data.
Biometrika. 2015 Mar 1;102(1):151-168. doi: 10.1093/biomet/asu050.
6
Model selection for survival individualized treatment rules using the jackknife estimator.
BMC Med Res Methodol. 2022 Dec 22;22(1):328. doi: 10.1186/s12874-022-01811-6.
7
New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes.
J Am Stat Assoc. 2015;110(510):583-598. doi: 10.1080/01621459.2014.937488.
8
A Turbo Q-Learning (TQL) for Energy Efficiency Optimization in Heterogeneous Networks.
Entropy (Basel). 2020 Aug 30;22(9):957. doi: 10.3390/e22090957.
9
M-Learning for Individual Treatment Rule With Survival Outcomes.
Stat Med. 2025 May;44(10-12):e70093. doi: 10.1002/sim.70093.
10
Off-Policy Interleaved Q -Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems.
IEEE Trans Neural Netw Learn Syst. 2019 May;30(5):1308-1320. doi: 10.1109/TNNLS.2018.2861945. Epub 2018 Sep 26.

引用本文的文献

1
Sparse 2-stage Bayesian meta-analysis for individualized treatments.
Biometrics. 2025 Jul 3;81(3). doi: 10.1093/biomtc/ujaf082.
2
Estimating individualized treatment rules by optimizing the adjusted probability of a longer survival.
Stat Methods Med Res. 2024 Sep;33(9):1517-1530. doi: 10.1177/09622802241262525. Epub 2024 Jul 25.
4
Dynamic Treatment Regimes Using Bayesian Additive Regression Trees for Censored Outcomes.
Lifetime Data Anal. 2024 Jan;30(1):181-212. doi: 10.1007/s10985-023-09605-8. Epub 2023 Sep 2.
5
Multi-stage optimal dynamic treatment regimes for survival outcomes with dependent censoring.
Biometrika. 2022 Aug 13;110(2):395-410. doi: 10.1093/biomet/asac047. eCollection 2023 Jun.
6
Model selection for survival individualized treatment rules using the jackknife estimator.
BMC Med Res Methodol. 2022 Dec 22;22(1):328. doi: 10.1186/s12874-022-01811-6.
7
Semiparametric single-index models for optimal treatment regimens with censored outcomes.
Lifetime Data Anal. 2022 Oct;28(4):744-763. doi: 10.1007/s10985-022-09566-4. Epub 2022 Aug 8.
8
A general framework for subgroup detection via one-step value difference estimation.
Biometrics. 2023 Sep;79(3):2116-2126. doi: 10.1111/biom.13711. Epub 2022 Aug 2.
10
Reinforcement Learning for Precision Oncology.
Cancers (Basel). 2021 Sep 15;13(18):4624. doi: 10.3390/cancers13184624.

本文引用的文献

1
The Kaplan-Meier Estimator as an Inverse-Probability-of-Censoring Weighted Average.
Am Stat. 2001;55(3):207-210. doi: 10.1198/000313001317098185. Epub 2012 Jan 1.
3
Weighted Kaplan-Meier estimators for two-stage treatment regimes.
Stat Med. 2010 Nov 10;29(25):2581-91. doi: 10.1002/sim.4020.
5
Reinforcement learning design for cancer clinical trials.
Stat Med. 2009 Nov 20;28(26):3294-315. doi: 10.1002/sim.3720.
6
Causal effect models for realistic individualized treatment and intention to treat rules.
Int J Biostat. 2007;3(1):Article 3. doi: 10.2202/1557-4679.1022.
7
Estimation and extrapolation of optimal treatment and testing strategies.
Stat Med. 2008 Oct 15;27(23):4678-721. doi: 10.1002/sim.3301.
8
Considerations for second-line therapy of non-small cell lung cancer.
Oncologist. 2008;13 Suppl 1:28-36. doi: 10.1634/theoncologist.13-S1-28.
9
An overview of statistical learning theory.
IEEE Trans Neural Netw. 1999;10(5):988-99. doi: 10.1109/72.788640.
10
On an exponential bound for the Kaplan-Meier estimator.
Lifetime Data Anal. 2007 Dec;13(4):481-96. doi: 10.1007/s10985-007-9055-z. Epub 2007 Aug 31.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验