• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

二项逻辑预测模型的样本量:超越变量标准的事件数。

Sample size for binary logistic prediction models: Beyond events per variable criteria.

机构信息

1 Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands.

2 Centre for Statistics in Medicine, Botnar Research Centre, University of Oxford, Oxford, UK.

出版信息

Stat Methods Med Res. 2019 Aug;28(8):2455-2474. doi: 10.1177/0962280218784726. Epub 2018 Jul 3.

DOI:10.1177/0962280218784726
PMID:29966490
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6710621/
Abstract

Binary logistic regression is one of the most frequently applied statistical approaches for developing clinical prediction models. Developers of such models often rely on an Events Per Variable criterion (EPV), notably EPV ≥10, to determine the minimal sample size required and the maximum number of candidate predictors that can be examined. We present an extensive simulation study in which we studied the influence of EPV, events fraction, number of candidate predictors, the correlations and distributions of candidate predictor variables, area under the ROC curve, and predictor effects on out-of-sample predictive performance of prediction models. The out-of-sample performance (calibration, discrimination and probability prediction error) of developed prediction models was studied before and after regression shrinkage and variable selection. The results indicate that EPV does not have a strong relation with metrics of predictive performance, and is not an appropriate criterion for (binary) prediction model development studies. We show that out-of-sample predictive performance can better be approximated by considering the number of predictors, the total sample size and the events fraction. We propose that the development of new sample size criteria for prediction models should be based on these three parameters, and provide suggestions for improving sample size determination.

摘要

二元逻辑回归是开发临床预测模型最常用的统计方法之一。此类模型的开发者通常依赖于事件数与变量数比(EPV)标准,特别是 EPV≥10,来确定所需的最小样本量和可以检查的最大候选预测因子数量。我们进行了一项广泛的模拟研究,研究了 EPV、事件比例、候选预测因子数量、候选预测因子变量的相关性和分布、ROC 曲线下面积以及预测因子对预测模型的样本外预测性能的影响。在进行回归收缩和变量选择之前和之后,我们研究了开发的预测模型的样本外性能(校准、区分和概率预测误差)。结果表明,EPV 与预测性能指标没有很强的关系,并且不是(二元)预测模型开发研究的合适标准。我们表明,可以通过考虑预测因子的数量、总样本量和事件比例更好地近似样本外预测性能。我们建议新的预测模型样本量标准应基于这三个参数,并提供改进样本量确定的建议。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f767/6710621/9e393387da79/10.1177_0962280218784726-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f767/6710621/3d94000c9264/10.1177_0962280218784726-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f767/6710621/b83e4d5ed358/10.1177_0962280218784726-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f767/6710621/8452bfc1393c/10.1177_0962280218784726-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f767/6710621/9e393387da79/10.1177_0962280218784726-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f767/6710621/3d94000c9264/10.1177_0962280218784726-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f767/6710621/b83e4d5ed358/10.1177_0962280218784726-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f767/6710621/8452bfc1393c/10.1177_0962280218784726-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f767/6710621/9e393387da79/10.1177_0962280218784726-fig4.jpg

相似文献

1
Sample size for binary logistic prediction models: Beyond events per variable criteria.二项逻辑预测模型的样本量:超越变量标准的事件数。
Stat Methods Med Res. 2019 Aug;28(8):2455-2474. doi: 10.1177/0962280218784726. Epub 2018 Jul 3.
2
A simulation study of sample size demonstrated the importance of the number of events per variable to develop prediction models in clustered data.一项样本量的模拟研究表明,在聚类数据中开发预测模型时,每个变量的事件数对于模型的重要性。
J Clin Epidemiol. 2015 Dec;68(12):1406-14. doi: 10.1016/j.jclinepi.2015.02.002. Epub 2015 Feb 14.
3
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis.二元逻辑回归分析中每10个事件对应1个变量的标准没有理论依据。
BMC Med Res Methodol. 2016 Nov 24;16(1):163. doi: 10.1186/s12874-016-0267-3.
4
Adequate sample size for developing prediction models is not simply related to events per variable.开发预测模型时,足够的样本量并非仅仅与每个变量的事件数相关。
J Clin Epidemiol. 2016 Aug;76:175-82. doi: 10.1016/j.jclinepi.2016.02.031. Epub 2016 Mar 8.
5
Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure.逻辑回归模型的性能:超越每个变量的事件数,数据结构的作用。
J Clin Epidemiol. 2011 Sep;64(9):993-1000. doi: 10.1016/j.jclinepi.2010.11.012. Epub 2011 Mar 16.
6
Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models.每个变量的事件数(EPV)以及评估逻辑回归模型样本外有效性的不同策略的相对性能。
Stat Methods Med Res. 2017 Apr;26(2):796-808. doi: 10.1177/0962280214558972. Epub 2014 Nov 19.
7
Comparison of likelihood penalization and variance decomposition approaches for clinical prediction models: A simulation study.似然惩罚和方差分解方法在临床预测模型中的比较:一项模拟研究。
Biom J. 2024 Jan;66(1):e2200108. doi: 10.1002/bimj.202200108. Epub 2023 May 18.
8
Sample size considerations and predictive performance of multinomial logistic prediction models.多分类逻辑回归预测模型的样本量考虑因素和预测性能。
Stat Med. 2019 Apr 30;38(9):1601-1619. doi: 10.1002/sim.8063. Epub 2019 Jan 6.
9
Minimum sample size for developing a multivariable prediction model using multinomial logistic regression.使用多项逻辑回归开发多变量预测模型的最小样本量。
Stat Methods Med Res. 2023 Mar;32(3):555-571. doi: 10.1177/09622802231151220. Epub 2023 Jan 19.
10
A simulation study of the number of events per variable in logistic regression analysis.逻辑回归分析中每个变量事件数的模拟研究。
J Clin Epidemiol. 1996 Dec;49(12):1373-9. doi: 10.1016/s0895-4356(96)00236-3.

引用本文的文献

1
Development and validation of a machine learning-based prediction model for frailty in older adults with diabetes: a study protocol for a retrospective cohort study.基于机器学习的糖尿病老年患者衰弱预测模型的开发与验证:一项回顾性队列研究的研究方案
BMJ Open. 2025 Sep 3;15(9):e095312. doi: 10.1136/bmjopen-2024-095312.
2
Prediction of QTc Prolongation in Acute Poisoning with Atypical Antipsychotics Using Machine Learning Techniques: A Study from Poison Control Center.使用机器学习技术预测非典型抗精神病药物急性中毒时的QTc间期延长:来自中毒控制中心的一项研究
Cardiovasc Toxicol. 2025 Aug 30. doi: 10.1007/s12012-025-10055-x.
3

本文引用的文献

1
Firth's logistic regression with rare events: accurate effect estimates and predictions?针对罕见事件的费思逻辑回归:准确的效应估计与预测?
Stat Med. 2017 Jun 30;36(14):2302-2317. doi: 10.1002/sim.7273. Epub 2017 Mar 12.
2
Performance of Firth-and logF-type penalized methods in risk prediction for small or sparse binary data.Firth 法和对数 F 型惩罚方法在小样本或稀疏二元数据风险预测中的性能
BMC Med Res Methodol. 2017 Feb 23;17(1):33. doi: 10.1186/s12874-017-0313-9.
3
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis.
Examining the Relationship Between Community Resources and Early Disability Identification: Variation by Child Race or Ethnicity.
探究社区资源与早期残疾识别之间的关系:因儿童种族或族裔而异
Prev Sci. 2025 Aug 16. doi: 10.1007/s11121-025-01829-4.
4
Development and validation of a conventional MRI-based model to predict cerebral palsy in infants (aged 6-24 months) with periventricular white matter injury: a multicentre, retrospective cohort study.基于传统磁共振成像的模型用于预测脑室周围白质损伤的婴儿(6至24个月)脑瘫的开发与验证:一项多中心回顾性队列研究
EClinicalMedicine. 2025 Jul 30;86:103364. doi: 10.1016/j.eclinm.2025.103364. eCollection 2025 Aug.
5
Letter to 'Recurrent angiofibroma: analysis of risk factors and common sites of recurrence'.致《复发性血管纤维瘤:危险因素及复发常见部位分析》
Eur Arch Otorhinolaryngol. 2025 Aug 7. doi: 10.1007/s00405-025-09625-0.
6
TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods: a Korean translation.TRIPOD+AI声明:使用回归或机器学习方法的临床预测模型报告的更新指南:韩文翻译
Ewha Med J. 2025 Jul;48(3):e48. doi: 10.12771/emj.2025.00668. Epub 2025 Jul 31.
7
How to use learning curves to evaluate the sample size for malaria prediction models developed using machine learning algorithms.如何使用学习曲线评估利用机器学习算法开发的疟疾预测模型的样本量。
Malar J. 2025 Jul 24;24(1):242. doi: 10.1186/s12936-025-05479-3.
8
Risk factors for complications after infantile enterostomy: development of a clinical prediction model.婴儿肠造口术后并发症的危险因素:临床预测模型的构建
Front Public Health. 2025 Jul 3;13:1566789. doi: 10.3389/fpubh.2025.1566789. eCollection 2025.
9
Trajectories of Symptom Clusters and Their Predictive Factors in Patients With Colorectal Cancer 3 Months After Surgery: A Longitudinal Study.结直肠癌患者术后3个月症状群轨迹及其预测因素:一项纵向研究
Cancer Med. 2025 Jul;14(13):e71025. doi: 10.1002/cam4.71025.
10
Development and Validation of a Novel Risk Calculator to Predict Sub-optimal HIV Outcomes Among Pregnant and Postpartum Women with HIV in Kenya.用于预测肯尼亚感染艾滋病毒的孕妇和产后妇女不良艾滋病毒结局的新型风险计算器的开发与验证
AIDS Behav. 2025 Jul 10. doi: 10.1007/s10461-025-04814-8.
二元逻辑回归分析中每10个事件对应1个变量的标准没有理论依据。
BMC Med Res Methodol. 2016 Nov 24;16(1):163. doi: 10.1186/s12874-016-0267-3.
4
A computational approach to compare regression modelling strategies in prediction research.一种在预测研究中比较回归建模策略的计算方法。
BMC Med Res Methodol. 2016 Aug 25;16(1):107. doi: 10.1186/s12874-016-0209-0.
5
Adequate sample size for developing prediction models is not simply related to events per variable.开发预测模型时,足够的样本量并非仅仅与每个变量的事件数相关。
J Clin Epidemiol. 2016 Aug;76:175-82. doi: 10.1016/j.jclinepi.2016.02.031. Epub 2016 Mar 8.
6
Sample size considerations for the external validation of a multivariable prognostic model: a resampling study.多变量预后模型外部验证的样本量考量:一项重抽样研究
Stat Med. 2016 Jan 30;35(2):214-26. doi: 10.1002/sim.6787. Epub 2015 Nov 9.
7
Review and evaluation of penalised regression methods for risk prediction in low-dimensional data with few events.低事件数低维数据中风险预测的惩罚回归方法综述与评估
Stat Med. 2016 Mar 30;35(7):1159-77. doi: 10.1002/sim.6782. Epub 2015 Oct 29.
8
How to develop a more accurate risk prediction model when there are few events.当事件数量较少时,如何开发一个更准确的风险预测模型。
BMJ. 2015 Aug 11;351:h3868. doi: 10.1136/bmj.h3868.
9
Penalization, bias reduction, and default priors in logistic and related categorical and survival regressions.逻辑回归及相关分类和生存回归中的惩罚、偏差减少和默认先验
Stat Med. 2015 Oct 15;34(23):3133-43. doi: 10.1002/sim.6537. Epub 2015 May 26.
10
Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration.透明报告个体预后或诊断的多变量预测模型(TRIPOD):解释和说明。
Ann Intern Med. 2015 Jan 6;162(1):W1-73. doi: 10.7326/M14-0698.