UC Berkeley Center for Targeted Learning, Berkeley, CA, USA.
Unlearn.AI, Inc., San Francisco, CA, USA.
Int J Biostat. 2021 Dec 22;18(2):329-356. doi: 10.1515/ijb-2021-0072. eCollection 2022 Nov 1.
Estimating causal effects from randomized experiments is central to clinical research. Reducing the statistical uncertainty in these analyses is an important objective for statisticians. Registries, prior trials, and health records constitute a growing compendium of historical data on patients under standard-of-care that may be exploitable to this end. However, most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control. Here, we propose a use of historical data that exploits linear covariate adjustment to improve the efficiency of trial analyses without incurring bias. Specifically, we train a prognostic model on the historical data, then estimate the treatment effect using a linear regression while adjusting for the trial subjects' predicted outcomes (their ). We prove that, under certain conditions, this prognostic covariate adjustment procedure attains the minimum variance possible among a large class of estimators. When those conditions are not met, prognostic covariate adjustment is still more efficient than raw covariate adjustment and the gain in efficiency is proportional to a measure of the predictive accuracy of the prognostic model above and beyond the linear relationship with the raw covariates. We demonstrate the approach using simulations and a reanalysis of an Alzheimer's disease clinical trial and observe meaningful reductions in mean-squared error and the estimated variance. Lastly, we provide a simplified formula for asymptotic variance that enables power calculations that account for these gains. Sample size reductions between 10% and 30% are attainable when using prognostic models that explain a clinically realistic percentage of the outcome variance.
从随机临床试验中估计因果效应是临床研究的核心。减少这些分析中的统计不确定性是统计学家的一个重要目标。登记处、先前的试验和健康记录构成了越来越多的标准治疗下患者的历史数据汇编,这些数据可能可以用于达到这一目的。然而,大多数用于历史借鉴的方法通过牺牲严格的Ⅰ型错误率控制来实现方差的减少。在这里,我们提出了一种利用历史数据的方法,通过线性协变量调整来提高试验分析的效率,而不会产生偏差。具体来说,我们在历史数据上训练一个预测模型,然后使用线性回归估计治疗效果,同时调整试验对象的预测结果(他们的 )。我们证明,在某些条件下,这种预后协变量调整程序在一大类估计器中达到了最小方差。当这些条件不满足时,预后协变量调整仍然比原始协变量调整更有效,并且效率的提高与预测模型的预测准确性成正比,超出了与原始协变量的线性关系。我们使用模拟和对阿尔茨海默病临床试验的重新分析来证明这种方法,并观察到均方误差和估计方差的显著减少。最后,我们提供了一个简化的渐近方差公式,使能够进行考虑这些收益的功效计算。当使用可以解释结果方差的临床合理百分比的预测模型时,可以实现 10%至 30%的样本量减少。