Gao Daiqi, Liu Yufeng, Zeng Donglin
Department of Statistics and Operations Research, The University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA.
Department of Statistics and Operations Research, Department of Genetics, Department of Biostatistics, The University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA.
J Mach Learn Res. 2022;23(250).
Learning optimal individualized treatment rules (ITRs) has become increasingly important in the modern era of precision medicine. Many statistical and machine learning methods for learning optimal ITRs have been developed in the literature. However, most existing methods are based on data collected from traditional randomized controlled trials and thus cannot take advantage of the accumulative evidence when patients enter the trials sequentially. It is also ethically important that future patients should have a high probability to be treated optimally based on the updated knowledge so far. In this work, we propose a new design called sequentially rule-adaptive trials to learn optimal ITRs based on the contextual bandit framework, in contrast to the response-adaptive design in traditional adaptive trials. In our design, each entering patient will be allocated with a high probability to the current best treatment for this patient, which is estimated using the past data based on some machine learning algorithm (for example, outcome weighted learning in our implementation). We explore the tradeoff between training and test values of the estimated ITR in single-stage problems by proving theoretically that for a higher probability of following the estimated ITR, the training value converges to the optimal value at a faster rate, while the test value converges at a slower rate. This problem is different from traditional decision problems in the sense that the training data are generated sequentially and are dependent. We also develop a tool that combines martingale with empirical process to tackle the problem that cannot be solved by previous techniques for i.i.d. data. We show by numerical examples that without much loss of the test value, our proposed algorithm can improve the training value significantly as compared to existing methods. Finally, we use a real data study to illustrate the performance of the proposed method.
在精准医学的现代时代,学习最优个体化治疗规则(ITRs)变得越来越重要。文献中已经开发了许多用于学习最优ITRs的统计和机器学习方法。然而,大多数现有方法基于从传统随机对照试验收集的数据,因此当患者依次进入试验时,无法利用累积证据。同样重要的是,从伦理角度来看,未来的患者应该有很高的概率根据迄今为止更新的知识接受最优治疗。在这项工作中,我们提出了一种名为序贯规则自适应试验的新设计,以基于上下文博弈框架学习最优ITRs,这与传统自适应试验中的响应自适应设计形成对比。在我们的设计中,每个进入的患者将以高概率被分配到针对该患者的当前最佳治疗,这是使用基于某种机器学习算法的过去数据进行估计的(例如,在我们的实现中的结果加权学习)。我们通过理论证明,在单阶段问题中探索估计ITR的训练值和测试值之间的权衡:对于遵循估计ITR的更高概率,训练值以更快的速率收敛到最优值,而测试值收敛速率较慢。这个问题与传统决策问题不同,因为训练数据是顺序生成的并且是相关的。我们还开发了一种将鞅与经验过程相结合的工具,以解决先前用于独立同分布数据的技术无法解决的问题。我们通过数值示例表明,与现有方法相比,在测试值没有太大损失的情况下,我们提出的算法可以显著提高训练值。最后,我们使用实际数据研究来说明所提出方法的性能。