Poulet Pierre-Emmanuel, Tran Maylis, Tezenas du Montcel Sophie, Dubois Bruno, Durrleman Stanley, Jedynak Bruno
ARAMIS, Sorbonne Université, Paris, France.
Institut de Neurologie, Hôpital Salpêtrière Sorbonne-Université, 47 bd de l'hôpital, Paris, 75651, cedex 13, France.
BMC Med Res Methodol. 2025 Aug 29;25(1):204. doi: 10.1186/s12874-025-02647-6.
Prediction-powered inference (PPI) (Angelopoulos et al., Science 382(6671):669-674, 2023) and its subsequent development called PPI++ (Angelopoulos et al., 2023) provide a novel approach to standard statistical estimation, leveraging machine learning systems, to enhance unlabeled data with predictions. We use this paradigm in clinical trials. The predictions are provided by disease progression models, providing prognostic scores for all the participants as a function of baseline covariates. The proposed method would empower clinical trials by providing untreated digital twins of the treated patients while remaining statistically valid. The potential implications of this new estimator of the treatment effect in a two-arm randomized clinical trial (RCT) are manifold. First, it leads to an overall reduction of the sample size required to reach the same power as a standard RCT. Secondly, it advocates for an imbalance of controls and treated patients, requiring fewer controls to achieve the same power. Finally, this technique directly transfers any disease prediction model trained on large cohorts to practical and scientifically valid use. In this paper, we demonstrate the theoretical properties of this estimator and illustrate them through simulations. We show that it is asymptotically unbiased for the Average Treatment Effect and derive an explicit formula for its variance. We then compare this estimator to a regression-based linear covariate adjustment method. An application to an Alzheimer's disease clinical trial showcases the potential to reduce the sample size.
预测驱动推理(PPI)(安杰洛普洛斯等人,《科学》382(6671):669 - 674,2023年)及其后续发展的PPI++(安杰洛普洛斯等人,2023年)提供了一种标准统计估计的新方法,利用机器学习系统,通过预测增强未标记数据。我们在临床试验中使用这种范式。预测由疾病进展模型提供,根据基线协变量为所有参与者提供预后评分。所提出的方法将通过提供治疗患者的未治疗数字孪生体来增强临床试验,同时保持统计有效性。这种治疗效果新估计量在双臂随机临床试验(RCT)中的潜在影响是多方面的。首先,它导致达到与标准RCT相同功效所需的样本量总体减少。其次,它主张控制组和治疗患者之间的不平衡,实现相同功效所需的控制组更少。最后,这项技术直接将在大型队列上训练的任何疾病预测模型转化为实际且科学有效的应用。在本文中,我们展示了这种估计量的理论性质,并通过模拟进行说明。我们表明它对于平均治疗效果是渐近无偏的,并推导了其方差的显式公式。然后我们将这种估计量与基于回归的线性协变量调整方法进行比较。在一项阿尔茨海默病临床试验中的应用展示了减少样本量的潜力。