通过机器学习在具有有序和事件发生时间结果的随机试验中优化精度和功效,并应用于2019冠状病毒病
Optimising precision and power by machine learning in randomised trials with ordinal and time-to-event outcomes with an application to COVID-19.
作者信息
Williams Nicholas, Rosenblum Michael, Díaz Iván
机构信息
Department of Epidemiology Columbia University Mailman School of Public Health New York City New York USA.
Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Baltimore Maryland USA.
出版信息
J R Stat Soc Ser A Stat Soc. 2022 Sep 23. doi: 10.1111/rssa.12915.
The rapid finding of effective therapeutics requires efficient use of available resources in clinical trials. Covariate adjustment can yield statistical estimates with improved precision, resulting in a reduction in the number of participants required to draw futility or efficacy conclusions. We focus on time-to-event and ordinal outcomes. When more than a few baseline covariates are available, a key question for covariate adjustment in randomised studies is how to fit a model relating the outcome and the baseline covariates to maximise precision. We present a novel theoretical result establishing conditions for asymptotic normality of a variety of covariate-adjusted estimators that rely on machine learning (e.g., -regularisation, Random Forests, XGBoost, and Multivariate Adaptive Regression Splines [MARS]), under the assumption that outcome data are missing completely at random. We further present a consistent estimator of the asymptotic variance. Importantly, the conditions do not require the machine learning methods to converge to the true outcome distribution conditional on baseline variables, as long as they converge to some (possibly incorrect) limit. We conducted a simulation study to evaluate the performance of the aforementioned prediction methods in COVID-19 trials. Our simulation is based on resampling longitudinal data from over 1500 patients hospitalised with COVID-19 at Weill Cornell Medicine New York Presbyterian Hospital. We found that using -regularisation led to estimators and corresponding hypothesis tests that control type 1 error and are more precise than an unadjusted estimator across all sample sizes tested. We also show that when covariates are not prognostic of the outcome, -regularisation remains as precise as the unadjusted estimator, even at small sample sizes ( ). We give an R package adjrct that performs model-robust covariate adjustment for ordinal and time-to-event outcomes.
快速找到有效的治疗方法需要在临床试验中高效利用现有资源。协变量调整可以产生精度更高的统计估计值,从而减少得出无效或有效结论所需的参与者数量。我们关注事件发生时间和有序结局。当有多个基线协变量可用时,随机研究中协变量调整的一个关键问题是如何拟合一个将结局与基线协变量相关联的模型,以最大化精度。我们提出了一个新的理论结果,在结局数据完全随机缺失的假设下,为依赖机器学习的各种协变量调整估计量(例如, -正则化、随机森林、XGBoost和多元自适应回归样条[MARS])建立渐近正态性的条件。我们还提出了渐近方差的一致估计量。重要的是,这些条件不要求机器学习方法收敛到以基线变量为条件的真实结局分布,只要它们收敛到某个(可能不正确的)极限即可。我们进行了一项模拟研究,以评估上述预测方法在COVID-19试验中的性能。我们的模拟基于对纽约长老会医院威尔康奈尔医学院1500多名因COVID-19住院患者的纵向数据进行重采样。我们发现,使用 -正则化会导致估计量和相应的假设检验能够控制第一类错误,并且在所有测试的样本量下都比未调整的估计量更精确。我们还表明,当协变量对结局没有预后作用时,即使在小样本量( )下, -正则化也能保持与未调整估计量一样的精度。我们给出了一个R包adjrct,它可以对有序和事件发生时间结局进行模型稳健的协变量调整。