Suppr超能文献

使用分类阈值评估预测模型的扩展样本量计算

Extended sample size calculations for evaluation of prediction models using a threshold for classification.

作者信息

Whittle Rebecca, Ensor Joie, Archer Lucinda, Collins Gary S, Dhiman Paula, Denniston Alastair, Alderman Joseph, Legha Amardeep, van Smeden Maarten, Moons Karel G, Cazier Jean-Baptiste, Riley Richard D, Snell Kym I E

机构信息

Department of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of Birmingham, Birmingham, UK.

National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK.

出版信息

BMC Med Res Methodol. 2025 Jul 1;25(1):170. doi: 10.1186/s12874-025-02592-4.

Abstract

When evaluating the performance of a model for individualised risk prediction, the sample size needs to be large enough to precisely estimate the performance measures of interest. Current sample size guidance is based on precisely estimating calibration, discrimination, and net benefit, which should be the first stage of calculating the minimum required sample size. However, when a clinically important threshold is used for classification, other performance measures are also often reported. We extend the previously published guidance to precisely estimate threshold-based performance measures. We have reported closed-form solutions to estimate the sample size required to target sufficiently precise estimates of accuracy, specificity, sensitivity, positive predictive value (PPV), negative predictive value (NPV), and an iterative method to estimate the sample size required to target a sufficiently precise estimate of the F1-score, in an external evaluation study of a prediction model with a binary outcome. This approach requires the user to pre-specify the target standard error and the expected value for each performance measure alongside the outcome prevalence. We describe how the sample size formulae were derived and demonstrate their use in an example. Extension to time-to-event outcomes is also considered. In our examples, the minimum sample size required was lower than that required to precisely estimate the calibration slope, and we expect this would most often be the case. Our formulae, along with corresponding Python code and updated R, Stata and Python commands (pmvalsampsize), enable researchers to calculate the minimum sample size needed to precisely estimate threshold-based performance measures in an external evaluation study. These criteria should be used alongside previously published criteria to precisely estimate the calibration, discrimination, and net-benefit.

摘要

在评估用于个性化风险预测的模型性能时,样本量需要足够大,以便精确估计感兴趣的性能指标。当前的样本量指导原则基于精确估计校准、区分度和净效益,这应该是计算所需最小样本量的第一阶段。然而,当使用临床重要阈值进行分类时,通常还会报告其他性能指标。我们扩展了先前发布的指导原则,以精确估计基于阈值的性能指标。在一项具有二元结果的预测模型的外部评估研究中,我们报告了用于估计达到足够精确的准确性、特异性、敏感性、阳性预测值(PPV)、阴性预测值(NPV)所需样本量的封闭形式解,以及一种用于估计达到足够精确的F1分数所需样本量的迭代方法。这种方法要求用户预先指定每个性能指标的目标标准误差和预期值以及结果患病率。我们描述了样本量公式是如何推导出来的,并在一个例子中展示了它们的用法。还考虑了对事件发生时间结果的扩展。在我们的例子中,所需的最小样本量低于精确估计校准斜率所需的样本量,我们预计大多数情况下都会如此。我们的公式,连同相应的Python代码以及更新后的R、Stata和Python命令(pmvalsampsize),使研究人员能够在外部评估研究中计算精确估计基于阈值的性能指标所需的最小样本量。这些标准应与先前发布的标准一起使用,以精确估计校准、区分度和净效益。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验