• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用分类阈值评估预测模型的扩展样本量计算

Extended sample size calculations for evaluation of prediction models using a threshold for classification.

作者信息

Whittle Rebecca, Ensor Joie, Archer Lucinda, Collins Gary S, Dhiman Paula, Denniston Alastair, Alderman Joseph, Legha Amardeep, van Smeden Maarten, Moons Karel G, Cazier Jean-Baptiste, Riley Richard D, Snell Kym I E

机构信息

Department of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of Birmingham, Birmingham, UK.

National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK.

出版信息

BMC Med Res Methodol. 2025 Jul 1;25(1):170. doi: 10.1186/s12874-025-02592-4.

DOI:10.1186/s12874-025-02592-4
PMID:40596983
Abstract

When evaluating the performance of a model for individualised risk prediction, the sample size needs to be large enough to precisely estimate the performance measures of interest. Current sample size guidance is based on precisely estimating calibration, discrimination, and net benefit, which should be the first stage of calculating the minimum required sample size. However, when a clinically important threshold is used for classification, other performance measures are also often reported. We extend the previously published guidance to precisely estimate threshold-based performance measures. We have reported closed-form solutions to estimate the sample size required to target sufficiently precise estimates of accuracy, specificity, sensitivity, positive predictive value (PPV), negative predictive value (NPV), and an iterative method to estimate the sample size required to target a sufficiently precise estimate of the F1-score, in an external evaluation study of a prediction model with a binary outcome. This approach requires the user to pre-specify the target standard error and the expected value for each performance measure alongside the outcome prevalence. We describe how the sample size formulae were derived and demonstrate their use in an example. Extension to time-to-event outcomes is also considered. In our examples, the minimum sample size required was lower than that required to precisely estimate the calibration slope, and we expect this would most often be the case. Our formulae, along with corresponding Python code and updated R, Stata and Python commands (pmvalsampsize), enable researchers to calculate the minimum sample size needed to precisely estimate threshold-based performance measures in an external evaluation study. These criteria should be used alongside previously published criteria to precisely estimate the calibration, discrimination, and net-benefit.

摘要

在评估用于个性化风险预测的模型性能时,样本量需要足够大,以便精确估计感兴趣的性能指标。当前的样本量指导原则基于精确估计校准、区分度和净效益,这应该是计算所需最小样本量的第一阶段。然而,当使用临床重要阈值进行分类时,通常还会报告其他性能指标。我们扩展了先前发布的指导原则,以精确估计基于阈值的性能指标。在一项具有二元结果的预测模型的外部评估研究中,我们报告了用于估计达到足够精确的准确性、特异性、敏感性、阳性预测值(PPV)、阴性预测值(NPV)所需样本量的封闭形式解,以及一种用于估计达到足够精确的F1分数所需样本量的迭代方法。这种方法要求用户预先指定每个性能指标的目标标准误差和预期值以及结果患病率。我们描述了样本量公式是如何推导出来的,并在一个例子中展示了它们的用法。还考虑了对事件发生时间结果的扩展。在我们的例子中,所需的最小样本量低于精确估计校准斜率所需的样本量,我们预计大多数情况下都会如此。我们的公式,连同相应的Python代码以及更新后的R、Stata和Python命令(pmvalsampsize),使研究人员能够在外部评估研究中计算精确估计基于阈值的性能指标所需的最小样本量。这些标准应与先前发布的标准一起使用,以精确估计校准、区分度和净效益。

相似文献

1
Extended sample size calculations for evaluation of prediction models using a threshold for classification.使用分类阈值评估预测模型的扩展样本量计算
BMC Med Res Methodol. 2025 Jul 1;25(1):170. doi: 10.1186/s12874-025-02592-4.
2
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
3
Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.染色体臂 1p 和 19q 缺失的检测在胶质瘤患者中的诊断准确性和成本效益。
Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.
4
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
5
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
6
The comparative and added prognostic value of biomarkers to the Revised Cardiac Risk Index for preoperative prediction of major adverse cardiac events and all-cause mortality in patients who undergo noncardiac surgery.生物标志物对改良心脏风险指数在预测非心脏手术患者主要不良心脏事件和全因死亡率方面的比较和附加预后价值。
Cochrane Database Syst Rev. 2021 Dec 21;12(12):CD013139. doi: 10.1002/14651858.CD013139.pub2.
7
Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.降低男男性行为者中艾滋病毒性传播风险的行为干预措施。
Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.
8
Diagnostic tests and algorithms used in the investigation of haematuria: systematic reviews and economic evaluation.用于血尿调查的诊断测试和算法:系统评价与经济评估
Health Technol Assess. 2006 Jun;10(18):iii-iv, xi-259. doi: 10.3310/hta10180.
9
Rapid, point-of-care antigen tests for diagnosis of SARS-CoV-2 infection.用于 SARS-CoV-2 感染诊断的快速、即时抗原检测。
Cochrane Database Syst Rev. 2022 Jul 22;7(7):CD013705. doi: 10.1002/14651858.CD013705.pub3.
10
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

本文引用的文献

1
Evaluation of clinical prediction models (part 3): calculating the sample size required for an external validation study.临床预测模型评估(第3部分):计算外部验证研究所需的样本量。
BMJ. 2024 Jan 22;384:e074821. doi: 10.1136/bmj-2023-074821.
2
Evaluation of clinical prediction models (part 2): how to undertake an external validation study.临床预测模型的评估(第 2 部分):如何进行外部验证研究。
BMJ. 2024 Jan 15;384:e074820. doi: 10.1136/bmj-2023-074820.
3
Early prediction of ventilator-associated pneumonia with machine learning models: A systematic review and meta-analysis of prediction model performance.
运用机器学习模型对呼吸机相关性肺炎进行早期预测:预测模型性能的系统评价和荟萃分析。
Eur J Intern Med. 2024 Mar;121:76-87. doi: 10.1016/j.ejim.2023.11.009. Epub 2023 Nov 18.
4
Systematic review of externally validated machine learning models for predicting acute kidney injury in general hospital patients.对用于预测综合医院患者急性肾损伤的外部验证机器学习模型的系统评价。
Front Nephrol. 2023 Aug 3;3:1220214. doi: 10.3389/fneph.2023.1220214. eCollection 2023.
5
Targeted validation: validating clinical prediction models in their intended population and setting.靶向验证:在目标人群和环境中验证临床预测模型。
Diagn Progn Res. 2022 Dec 22;6(1):24. doi: 10.1186/s41512-022-00136-8.
6
Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models.系统评价确定了基于机器学习的预测模型研究的设计和方法实施情况。
J Clin Epidemiol. 2023 Feb;154:8-22. doi: 10.1016/j.jclinepi.2022.11.015. Epub 2022 Nov 25.
7
Confidence interval for micro-averaged and macro-averaged scores.微观平均和宏观平均分数的置信区间。
Appl Intell (Dordr). 2022 Mar;52(5):4961-4972. doi: 10.1007/s10489-021-02635-5. Epub 2021 Jul 31.
8
Minimum sample size calculations for external validation of a clinical prediction model with a time-to-event outcome.用于时间事件结局的临床预测模型外部验证的最小样本量计算。
Stat Med. 2022 Mar 30;41(7):1280-1295. doi: 10.1002/sim.9275. Epub 2021 Dec 16.
9
Minimum sample size for external validation of a clinical prediction model with a binary outcome.具有二元结局的临床预测模型外部验证的最小样本量
Stat Med. 2021 Aug 30;40(19):4230-4251. doi: 10.1002/sim.9025. Epub 2021 May 24.
10
External validation of clinical prediction models: simulation-based sample size calculations were more reliable than rules-of-thumb.临床预测模型的外部验证:基于模拟的样本量计算比经验法则更可靠。
J Clin Epidemiol. 2021 Jul;135:79-89. doi: 10.1016/j.jclinepi.2021.02.011. Epub 2021 Feb 14.