带删失数据的Q学习法

Q-LEARNING WITH CENSORED DATA.

作者信息

Goldberg Yair, Kosorok Michael R

机构信息

Department of Biostatistics, The University of North Carolina At Chapel Hill, Chapel Hill, NC 27599, U.S.A.

出版信息

Ann Stat. 2012 Feb 1;40(1):529-560. doi: 10.1214/12-AOS968.

DOI:10.1214/12-AOS968

PMID:22754029

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3385950/

Abstract

We develop methodology for a multistage-decision problem with flexible number of stages in which the rewards are survival times that are subject to censoring. We present a novel Q-learning algorithm that is adjusted for censored data and allows a flexible number of stages. We provide finite sample bounds on the generalization error of the policy learned by the algorithm, and show that when the optimal Q-function belongs to the approximation space, the expected survival time for policies obtained by the algorithm converges to that of the optimal policy. We simulate a multistage clinical trial with flexible number of stages and apply the proposed censored-Q-learning algorithm to find individualized treatment regimens. The methodology presented in this paper has implications in the design of personalized medicine trials in cancer and in other life-threatening diseases.

摘要

我们针对具有灵活阶段数的多阶段决策问题开发了一种方法，其中奖励是受删失影响的生存时间。我们提出了一种新颖的Q学习算法，该算法针对删失数据进行了调整，并允许灵活的阶段数。我们给出了算法学习到的策略的泛化误差的有限样本界，并表明当最优Q函数属于逼近空间时，算法得到的策略的预期生存时间收敛到最优策略的预期生存时间。我们模拟了一个具有灵活阶段数的多阶段临床试验，并应用所提出的删失Q学习算法来寻找个性化治疗方案。本文提出的方法对癌症和其他危及生命疾病的个性化医学试验设计具有重要意义。

相似文献

Q-LEARNING WITH CENSORED DATA.带删失数据的Q学习法

Ann Stat. 2012 Feb 1;40(1):529-560. doi: 10.1214/12-AOS968.

Imputation-based Q-learning for optimizing dynamic treatment regimes with right-censored survival outcome.基于插补的 Q 学习优化右删失生存结局的动态治疗方案。

Biometrics. 2023 Dec;79(4):3676-3689. doi: 10.1111/biom.13872. Epub 2023 May 17.

Tree based weighted learning for estimating individualized treatment rules with censored data.基于树的加权学习方法用于估计含删失数据的个体化治疗规则

Electron J Stat. 2017;11(2):3927-3953. doi: 10.1214/17-EJS1305. Epub 2017 Oct 18.

A Generalization Error for Q-Learning.Q学习的泛化误差

J Mach Learn Res. 2005 Jul;6:1073-1097.

Doubly Robust Learning for Estimating Individualized Treatment with Censored Data.用于使用删失数据估计个体化治疗的双重稳健学习

Biometrika. 2015 Mar 1;102(1):151-168. doi: 10.1093/biomet/asu050.

Model selection for survival individualized treatment rules using the jackknife estimator.利用刀切估计量进行生存个体化治疗规则的模型选择。

BMC Med Res Methodol. 2022 Dec 22;22(1):328. doi: 10.1186/s12874-022-01811-6.

New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes.用于估计最优动态治疗方案的新统计学习方法。

J Am Stat Assoc. 2015;110(510):583-598. doi: 10.1080/01621459.2014.937488.

A Turbo Q-Learning (TQL) for Energy Efficiency Optimization in Heterogeneous Networks.一种用于异构网络能效优化的Turbo Q学习（TQL）

Entropy (Basel). 2020 Aug 30;22(9):957. doi: 10.3390/e22090957.

M-Learning for Individual Treatment Rule With Survival Outcomes.用于具有生存结局的个体治疗规则的移动学习

Stat Med. 2025 May;44(10-12):e70093. doi: 10.1002/sim.70093.

Off-Policy Interleaved Q -Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems.离策略交错Q学习：仿射非线性离散时间系统的最优控制

IEEE Trans Neural Netw Learn Syst. 2019 May;30(5):1308-1320. doi: 10.1109/TNNLS.2018.2861945. Epub 2018 Sep 26.

引用本文的文献

Sparse 2-stage Bayesian meta-analysis for individualized treatments.用于个体化治疗的稀疏两阶段贝叶斯荟萃分析。

Biometrics. 2025 Jul 3;81(3). doi: 10.1093/biomtc/ujaf082.

Estimating individualized treatment rules by optimizing the adjusted probability of a longer survival.通过优化更长生存时间的调整概率来估计个体化治疗规则。

Stat Methods Med Res. 2024 Sep;33(9):1517-1530. doi: 10.1177/09622802241262525. Epub 2024 Jul 25.

Estimation of optimal treatment regimes with electronic medical record data using the residual life value estimator.利用剩余寿命值估计器从电子病历数据中估计最佳治疗方案。

Biostatistics. 2024 Oct 1;25(4):933-946. doi: 10.1093/biostatistics/kxae002.

Dynamic Treatment Regimes Using Bayesian Additive Regression Trees for Censored Outcomes.基于贝叶斯加性回归树的删失结局动态治疗方案。

Lifetime Data Anal. 2024 Jan;30(1):181-212. doi: 10.1007/s10985-023-09605-8. Epub 2023 Sep 2.

Multi-stage optimal dynamic treatment regimes for survival outcomes with dependent censoring.用于具有相依删失的生存结局的多阶段最优动态治疗方案

Biometrika. 2022 Aug 13;110(2):395-410. doi: 10.1093/biomet/asac047. eCollection 2023 Jun.

Model selection for survival individualized treatment rules using the jackknife estimator.利用刀切估计量进行生存个体化治疗规则的模型选择。

BMC Med Res Methodol. 2022 Dec 22;22(1):328. doi: 10.1186/s12874-022-01811-6.

Semiparametric single-index models for optimal treatment regimens with censored outcomes.半参数单指标模型在有删失结局的最优治疗方案中的应用。

Lifetime Data Anal. 2022 Oct;28(4):744-763. doi: 10.1007/s10985-022-09566-4. Epub 2022 Aug 8.

A general framework for subgroup detection via one-step value difference estimation.一种通过一步值差估计进行亚组检测的通用框架。

Biometrics. 2023 Sep;79(3):2116-2126. doi: 10.1111/biom.13711. Epub 2022 Aug 2.

Privacy-preserving estimation of an optimal individualized treatment rule: a case study in maximizing time to severe depression-related outcomes.隐私保护的最优个体化治疗规则估计：最大化严重抑郁相关结局时间的案例研究。

Lifetime Data Anal. 2022 Jul;28(3):512-542. doi: 10.1007/s10985-022-09554-8. Epub 2022 May 2.

Reinforcement Learning for Precision Oncology.用于精准肿瘤学的强化学习

Cancers (Basel). 2021 Sep 15;13(18):4624. doi: 10.3390/cancers13184624.

本文引用的文献

The Kaplan-Meier Estimator as an Inverse-Probability-of-Censoring Weighted Average.作为删失逆概率加权平均值的Kaplan-Meier估计量

Am Stat. 2001;55(3):207-210. doi: 10.1198/000313001317098185. Epub 2012 Jan 1.

Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, Part I: main content.用于估计最优动态治疗方案的动态模式边际结构均值模型，第一部分：主要内容。

Int J Biostat. 2010;6(2):Article 8.

Weighted Kaplan-Meier estimators for two-stage treatment regimes.加权 Kaplan-Meier 估计量在两阶段治疗方案中的应用。

Stat Med. 2010 Nov 10;29(25):2581-91. doi: 10.1002/sim.4020.

Phase III trial comparing vinflunine with docetaxel in second-line advanced non-small-cell lung cancer previously treated with platinum-containing chemotherapy.比较长春氟宁与多西他赛二线治疗含铂化疗后进展的晚期非小细胞肺癌的 III 期临床试验。

J Clin Oncol. 2010 May 1;28(13):2167-73. doi: 10.1200/JCO.2009.23.4146. Epub 2010 Mar 29.

Reinforcement learning design for cancer clinical trials.强化学习在癌症临床试验中的设计。

Stat Med. 2009 Nov 20;28(26):3294-315. doi: 10.1002/sim.3720.

Causal effect models for realistic individualized treatment and intention to treat rules.用于现实个体化治疗和意向性治疗规则的因果效应模型。

Int J Biostat. 2007;3(1):Article 3. doi: 10.2202/1557-4679.1022.

Estimation and extrapolation of optimal treatment and testing strategies.最佳治疗与检测策略的估计和外推

Stat Med. 2008 Oct 15;27(23):4678-721. doi: 10.1002/sim.3301.

Considerations for second-line therapy of non-small cell lung cancer.非小细胞肺癌二线治疗的考量因素

Oncologist. 2008;13 Suppl 1:28-36. doi: 10.1634/theoncologist.13-S1-28.

An overview of statistical learning theory.统计学习理论概述。

IEEE Trans Neural Netw. 1999;10(5):988-99. doi: 10.1109/72.788640.

On an exponential bound for the Kaplan-Meier estimator.关于Kaplan-Meier估计量的指数界。

Lifetime Data Anal. 2007 Dec;13(4):481-96. doi: 10.1007/s10985-007-9055-z. Epub 2007 Aug 31.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验