Riley Richard D, Van Calster Ben, Collins Gary S
Centre for Prognosis Research, School of Medicine, Keele University, Staffordshire, UK.
Department of Development and Regeneration, KU Leuven, Leuven, Belgium.
Stat Med. 2021 Feb 20;40(4):859-864. doi: 10.1002/sim.8806. Epub 2020 Dec 7.
In 2019 we published a pair of articles in Statistics in Medicine that describe how to calculate the minimum sample size for developing a multivariable prediction model with a continuous outcome, or with a binary or time-to-event outcome. As for any sample size calculation, the approach requires the user to specify anticipated values for key parameters. In particular, for a prediction model with a binary outcome, the outcome proportion and a conservative estimate for the overall fit of the developed model as measured by the Cox-Snell R (proportion of variance explained) must be specified. This proposal raises the question of how to identify a plausible value for R in advance of model development. Our articles suggest researchers should identify R from closely related models already published in their field. In this letter, we present details on how to derive R using the reported C statistic (AUROC) for such existing prediction models with a binary outcome. The C statistic is commonly reported, and so our approach allows researchers to obtain R for subsequent sample size calculations for new models. Stata and R code is provided, and a small simulation study.
2019年,我们在《医学统计学》上发表了两篇文章,阐述了如何计算用于构建具有连续结局、二元结局或事件发生时间结局的多变量预测模型的最小样本量。与任何样本量计算一样,该方法要求用户指定关键参数的预期值。特别是,对于具有二元结局的预测模型,必须指定结局比例以及通过考克斯-斯内尔R(方差解释比例)衡量的所构建模型整体拟合度的保守估计值。这一建议引发了一个问题,即在模型开发之前如何确定R的合理值。我们的文章建议研究人员应从其所在领域已发表的密切相关模型中确定R值。在这封信中,我们详细介绍了如何使用已报道的此类具有二元结局的现有预测模型的C统计量(曲线下面积)来推导R值。C统计量通常会被报告,因此我们的方法使研究人员能够为新模型的后续样本量计算获得R值。我们提供了Stata和R代码以及一项小型模拟研究。