Zhao Yufan, Zeng Donglin, Socinski Mark A, Kosorok Michael R
Global Biostatistics and Epidemiology, Amgen Inc., One Amgen Center Drive, Thousand Oaks, California 91320, USA.
Biometrics. 2011 Dec;67(4):1422-33. doi: 10.1111/j.1541-0420.2011.01572.x. Epub 2011 Mar 8.
Typical regimens for advanced metastatic stage IIIB/IV nonsmall cell lung cancer (NSCLC) consist of multiple lines of treatment. We present an adaptive reinforcement learning approach to discover optimal individualized treatment regimens from a specially designed clinical trial (a "clinical reinforcement trial") of an experimental treatment for patients with advanced NSCLC who have not been treated previously with systemic therapy. In addition to the complexity of the problem of selecting optimal compounds for first- and second-line treatments based on prognostic factors, another primary goal is to determine the optimal time to initiate second-line therapy, either immediately or delayed after induction therapy, yielding the longest overall survival time. A reinforcement learning method called Q-learning is utilized, which involves learning an optimal regimen from patient data generated from the clinical reinforcement trial. Approximating the Q-function with time-indexed parameters can be achieved by using a modification of support vector regression that can utilize censored data. Within this framework, a simulation study shows that the procedure can extract optimal regimens for two lines of treatment directly from clinical data without prior knowledge of the treatment effect mechanism. In addition, we demonstrate that the design reliably selects the best initial time for second-line therapy while taking into account the heterogeneity of NSCLC across patients.
晚期转移性IIIB/IV期非小细胞肺癌(NSCLC)的典型治疗方案包括多线治疗。我们提出了一种自适应强化学习方法,以从一项专门设计的针对未接受过全身治疗的晚期NSCLC患者的实验性治疗临床试验(“临床强化试验”)中发现最佳个体化治疗方案。除了基于预后因素为一线和二线治疗选择最佳化合物这一问题的复杂性之外,另一个主要目标是确定开始二线治疗的最佳时间,是在诱导治疗后立即开始还是延迟开始,以获得最长的总生存时间。我们使用了一种名为Q学习的强化学习方法,该方法涉及从临床强化试验产生的患者数据中学习最佳治疗方案。通过使用一种可以利用截尾数据的支持向量回归的修改方法,可以用时间索引参数来近似Q函数。在这个框架内,一项模拟研究表明,该程序可以直接从临床数据中提取两线治疗的最佳治疗方案,而无需事先了解治疗效果机制。此外,我们证明,该设计在考虑到NSCLC患者间的异质性的同时,可靠地选择了二线治疗的最佳初始时间。