Krakow Elizabeth F, Hemmer Michael, Wang Tao, Logan Brent, Arora Mukta, Spellman Stephen, Couriel Daniel, Alousi Amin, Pidala Joseph, Last Michael, Lachance Silvy, Moodie Erica E M
Am J Epidemiol. 2017 Jul 15;186(2):160-172. doi: 10.1093/aje/kwx027.
Q-learning is a method of reinforcement learning that employs backwards stagewise estimation to identify sequences of actions that maximize some long-term reward. The method can be applied to sequential multiple-assignment randomized trials to develop personalized adaptive treatment strategies (ATSs)-longitudinal practice guidelines highly tailored to time-varying attributes of individual patients. Sometimes, the basis for choosing which ATSs to include in a sequential multiple-assignment randomized trial (or randomized controlled trial) may be inadequate. Nonrandomized data sources may inform the initial design of ATSs, which could later be prospectively validated. In this paper, we illustrate challenges involved in using nonrandomized data for this purpose with a case study from the Center for International Blood and Marrow Transplant Research registry (1995-2007) aimed at 1) determining whether the sequence of therapeutic classes used in graft-versus-host disease prophylaxis and in refractory graft-versus-host disease is associated with improved survival and 2) identifying donor and patient factors with which to guide individualized immunosuppressant selections over time. We discuss how to communicate the potential benefit derived from following an ATS at the population and subgroup levels and how to evaluate its robustness to modeling assumptions. This worked example may serve as a model for developing ATSs from registries and cohorts in oncology and other fields requiring sequential treatment decisions.
Q学习是一种强化学习方法,它采用反向逐步估计来识别能使某些长期奖励最大化的行动序列。该方法可应用于序贯多重分配随机试验,以制定个性化自适应治疗策略(ATS)——高度针对个体患者随时间变化的特征量身定制的纵向实践指南。有时,在序贯多重分配随机试验(或随机对照试验)中选择纳入哪些ATS的依据可能并不充分。非随机数据源可为ATS的初始设计提供信息,随后可对其进行前瞻性验证。在本文中,我们通过国际血液和骨髓移植研究中心登记处(1995 - 2007年)的一个案例研究,阐述了为此目的使用非随机数据所涉及的挑战,该研究旨在:1)确定移植物抗宿主病预防和难治性移植物抗宿主病中使用的治疗类别序列是否与生存率提高相关;2)识别随着时间推移可用于指导个体化免疫抑制剂选择的供体和患者因素。我们讨论了如何在总体和亚组层面传达遵循ATS所带来的潜在益处,以及如何评估其对建模假设的稳健性。这个实例可作为一个模型,用于从肿瘤学及其他需要序贯治疗决策的领域的登记处和队列中开发ATS。