建立多变量预测模型的最小样本量：第二部分 - 二分类和生存数据。

Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes.

机构信息

Centre for Prognosis Research, Research Institute for Primary Care and Health Sciences, Keele University, Staffordshire, UK.

Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, Tennessee.

出版信息

Stat Med. 2019 Mar 30;38(7):1276-1296. doi: 10.1002/sim.7992. Epub 2018 Oct 24.

DOI:10.1002/sim.7992

PMID:30357870

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6519266/

Abstract

When designing a study to develop a new prediction model with binary or time-to-event outcomes, researchers should ensure their sample size is adequate in terms of the number of participants (n) and outcome events (E) relative to the number of predictor parameters (p) considered for inclusion. We propose that the minimum values of n and E (and subsequently the minimum number of events per predictor parameter, EPP) should be calculated to meet the following three criteria: (i) small optimism in predictor effect estimates as defined by a global shrinkage factor of ≥0.9, (ii) small absolute difference of ≤ 0.05 in the model's apparent and adjusted Nagelkerke's R , and (iii) precise estimation of the overall risk in the population. Criteria (i) and (ii) aim to reduce overfitting conditional on a chosen p, and require prespecification of the model's anticipated Cox-Snell R , which we show can be obtained from previous studies. The values of n and E that meet all three criteria provides the minimum sample size required for model development. Upon application of our approach, a new diagnostic model for Chagas disease requires an EPP of at least 4.8 and a new prognostic model for recurrent venous thromboembolism requires an EPP of at least 23. This reinforces why rules of thumb (eg, 10 EPP) should be avoided. Researchers might additionally ensure the sample size gives precise estimates of key predictor effects; this is especially important when key categorical predictors have few events in some categories, as this may substantially increase the numbers required.

摘要

在设计用于开发具有二项或时间至事件结局的新预测模型的研究时，研究人员应确保其样本量在参与者数量（n）和结局事件（E）方面相对于纳入的预测参数数量（p）足够大。我们建议，应计算 n 和 E 的最小值（以及随后的每个预测参数的最小事件数，EPP），以满足以下三个标准：（i）预测效果估计的小乐观性，定义为全局收缩因子≥0.9，（ii）模型的表观和调整后的 Nagelkerke R 的差异≤0.05，（iii）人群中总体风险的精确估计。标准（i）和（ii）旨在根据所选 p 减少过度拟合，并需要预先指定模型预期的 Cox-Snell R，我们表明可以从先前的研究中获得该 R。满足所有三个标准的 n 和 E 值提供了模型开发所需的最小样本量。在应用我们的方法时，用于 Chagas 病的新诊断模型需要至少 4.8 的 EPP，用于复发性静脉血栓栓塞的新预后模型需要至少 23 的 EPP。这再次证明了为什么应该避免经验法则（例如 10 个 EPP）。研究人员还可能确保样本量能够对关键预测因素的效果进行精确估计；当关键分类预测因素在某些类别中事件较少时，这一点尤为重要，因为这可能会大大增加所需的数量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7474/6519266/4569421a81be/SIM-38-1276-g001.jpg

相似文献

Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes.

Stat Med. 2019 Mar 30;38(7):1276-1296. doi: 10.1002/sim.7992. Epub 2018 Oct 24.

Minimum sample size for developing a multivariable prediction model: Part I - Continuous outcomes.

Stat Med. 2019 Mar 30;38(7):1262-1275. doi: 10.1002/sim.7993. Epub 2018 Oct 22.

Minimum sample size for developing a multivariable prediction model using multinomial logistic regression.

Stat Methods Med Res. 2023 Mar;32(3):555-571. doi: 10.1177/09622802231151220. Epub 2023 Jan 19.

Minimum sample size calculations for external validation of a clinical prediction model with a time-to-event outcome.

Stat Med. 2022 Mar 30;41(7):1280-1295. doi: 10.1002/sim.9275. Epub 2021 Dec 16.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

A note on estimating the Cox-Snell R from a reported C statistic (AUROC) to inform sample size calculations for developing a prediction model with a binary outcome.

Stat Med. 2021 Feb 20;40(4):859-864. doi: 10.1002/sim.8806. Epub 2020 Dec 7.

Minimum sample size for external validation of a clinical prediction model with a binary outcome.

Stat Med. 2021 Aug 30;40(19):4230-4251. doi: 10.1002/sim.9025. Epub 2021 May 24.

External validation of clinical prediction models: simulation-based sample size calculations were more reliable than rules-of-thumb.

J Clin Epidemiol. 2021 Jul;135:79-89. doi: 10.1016/j.jclinepi.2021.02.011. Epub 2021 Feb 14.

Sample sizes of prediction model studies in prostate cancer were rarely justified and often insufficient.

J Clin Epidemiol. 2021 May;133:53-60. doi: 10.1016/j.jclinepi.2020.12.011. Epub 2020 Dec 28.

Minimum sample size for external validation of a clinical prediction model with a continuous outcome.

Stat Med. 2021 Jan 15;40(1):133-146. doi: 10.1002/sim.8766. Epub 2020 Nov 4.

引用本文的文献

Development and validation of a risk prediction model for pulmonary tuberculosis in presumptive tuberculosis patients in Tigray, northern Ethiopia.

Sci Rep. 2025 Sep 2;15(1):32270. doi: 10.1038/s41598-025-17959-2.

Development and validation of a dynamic prediction model for single-dose methotrexate treatment success in tubal ectopic pregnancy: a multicentre cohort study in Chinese hospitals.

BMJ Open. 2025 Sep 1;15(9):e092110. doi: 10.1136/bmjopen-2024-092110.

Development and internal validation of a prediction model for post-COVID-19 condition 2 years after infection-results of the CORFU study.

Diagn Progn Res. 2025 Sep 1;9(1):18. doi: 10.1186/s41512-025-00203-w.

Development and validation of a postpartum cardiovascular disease risk prediction model in women incorporating reproductive and pregnancy-related predictors.

BMC Med. 2025 Aug 29;23(1):508. doi: 10.1186/s12916-025-04229-1.

High retention rates of custom 3D printed titanium implants in complex pelvic reconstruction, a report on 106 consecutive cases over 10 years.

Arch Orthop Trauma Surg. 2025 Aug 28;145(1):431. doi: 10.1007/s00402-025-06008-2.

The Associations Between the Swimming Speed, Anthropometrics, Kinematics, and Kinetics in the Butterfly Stroke.

Bioengineering (Basel). 2025 Jul 25;12(8):797. doi: 10.3390/bioengineering12080797.

Development and validation of a nomogram for diabetic tibial neuropathy based on ultrasound radiomics: a multicenter study.

BMC Med Imaging. 2025 Aug 27;25(1):355. doi: 10.1186/s12880-025-01896-7.

Predicting pain reduction following laparoscopic surgery for endometriosis: a retrospective cohort study using UK national and research databases.

BMJ Open. 2025 Aug 27;15(8):e099374. doi: 10.1136/bmjopen-2025-099374.

Development and validation of a diagnostic prediction model for pancreatic ductal adenocarcinoma: VAPOR 1, protocol for a prospective multicentre case-control study.

BMJ Open. 2025 Aug 27;15(8):e094505. doi: 10.1136/bmjopen-2024-094505.

Development and internal validation of an AI-based emergency triage model for predicting critical outcomes in emergency department.

Sci Rep. 2025 Aug 25;15(1):31212. doi: 10.1038/s41598-025-17180-1.

本文引用的文献

Tufts PACE Clinical Predictive Model Registry: update 1990 through 2015.

Diagn Progn Res. 2017 Dec 21;1:20. doi: 10.1186/s41512-017-0021-2. eCollection 2017.

Minimum sample size for developing a multivariable prediction model: Part I - Continuous outcomes.

Stat Med. 2019 Mar 30;38(7):1262-1275. doi: 10.1002/sim.7993. Epub 2018 Oct 22.

Sample size for binary logistic prediction models: Beyond events per variable criteria.

Stat Methods Med Res. 2019 Aug;28(8):2455-2474. doi: 10.1177/0962280218784726. Epub 2018 Jul 3.

Development and validation of QDiabetes-2018 risk prediction algorithm to estimate future risk of type 2 diabetes: cohort study.

BMJ. 2017 Nov 20;359:j5019. doi: 10.1136/bmj.j5019.

Developing and validating a cardiovascular risk score for patients in the community with prior cardiovascular disease.

Heart. 2017 Jun;103(12):891-892. doi: 10.1136/heartjnl-2016-310668. Epub 2017 Feb 23.

A guide to systematic review and meta-analysis of prediction model performance.

BMJ. 2017 Jan 5;356:i6460. doi: 10.1136/bmj.i6460.

Development and validation of risk prediction model for venous thromboembolism in postpartum women: multinational cohort study.

BMJ. 2016 Dec 5;355:i6253. doi: 10.1136/bmj.i6253.

No rationale for 1 variable per 10 events criterion for binary logistic regression analysis.

BMC Med Res Methodol. 2016 Nov 24;16(1):163. doi: 10.1186/s12874-016-0267-3.

Does my patient have chronic Chagas disease? Development and temporal validation of a diagnostic risk score.

Rev Soc Bras Med Trop. 2016 May-Jun;49(3):329-40. doi: 10.1590/0037-8682-0196-2016.

Adequate sample size for developing prediction models is not simply related to events per variable.

J Clin Epidemiol. 2016 Aug;76:175-82. doi: 10.1016/j.jclinepi.2016.02.031. Epub 2016 Mar 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

建立多变量预测模型的最小样本量：第二部分 - 二分类和生存数据。

Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献