• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

开发预测模型时,足够的样本量并非仅仅与每个变量的事件数相关。

Adequate sample size for developing prediction models is not simply related to events per variable.

作者信息

Ogundimu Emmanuel O, Altman Douglas G, Collins Gary S

机构信息

Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Diseases, Botnar Research Centre, University of Oxford, Windmill Road, Oxford OX3 7LD, UK.

Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Diseases, Botnar Research Centre, University of Oxford, Windmill Road, Oxford OX3 7LD, UK.

出版信息

J Clin Epidemiol. 2016 Aug;76:175-82. doi: 10.1016/j.jclinepi.2016.02.031. Epub 2016 Mar 8.

DOI:10.1016/j.jclinepi.2016.02.031
PMID:26964707
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5045274/
Abstract

OBJECTIVES

The choice of an adequate sample size for a Cox regression analysis is generally based on the rule of thumb derived from simulation studies of a minimum of 10 events per variable (EPV). One simulation study suggested scenarios in which the 10 EPV rule can be relaxed. The effect of a range of binary predictors with varying prevalence, reflecting clinical practice, has not yet been fully investigated.

STUDY DESIGN AND SETTING

We conducted an extended resampling study using a large general-practice data set, comprising over 2 million anonymized patient records, to examine the EPV requirements for prediction models with low-prevalence binary predictors developed using Cox regression. The performance of the models was then evaluated using an independent external validation data set. We investigated both fully specified models and models derived using variable selection.

RESULTS

Our results indicated that an EPV rule of thumb should be data driven and that EPV ≥ 20 ​ generally eliminates bias in regression coefficients when many low-prevalence predictors are included in a Cox model.

CONCLUSION

Higher EPV is needed when low-prevalence predictors are present in a model to eliminate bias in regression coefficients and improve predictive accuracy.

摘要

目的

Cox回归分析中合适样本量的选择通常基于经验法则,该法则源于对每个变量至少10个事件(EPV)的模拟研究。一项模拟研究提出了可以放宽10个EPV规则的情形。反映临床实践的一系列患病率不同的二元预测变量的影响尚未得到充分研究。

研究设计与设置

我们使用一个大型全科医疗数据集进行了一项扩展重采样研究,该数据集包含超过200万条匿名患者记录,以检验使用Cox回归开发的具有低患病率二元预测变量的预测模型的EPV要求。然后使用独立的外部验证数据集评估模型的性能。我们研究了完全指定的模型和使用变量选择得出的模型。

结果

我们的结果表明,EPV经验法则应基于数据驱动,并且当Cox模型中包含许多低患病率预测变量时,EPV≥20通常可消除回归系数中的偏差。

结论

当模型中存在低患病率预测变量时,需要更高的EPV来消除回归系数中的偏差并提高预测准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d813/5045274/15c1e8b019d8/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d813/5045274/83b3de32905f/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d813/5045274/5c37b691eb14/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d813/5045274/15c1e8b019d8/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d813/5045274/83b3de32905f/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d813/5045274/5c37b691eb14/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d813/5045274/15c1e8b019d8/gr3.jpg

相似文献

1
Adequate sample size for developing prediction models is not simply related to events per variable.开发预测模型时,足够的样本量并非仅仅与每个变量的事件数相关。
J Clin Epidemiol. 2016 Aug;76:175-82. doi: 10.1016/j.jclinepi.2016.02.031. Epub 2016 Mar 8.
2
Relaxing the rule of ten events per variable in logistic and Cox regression.放宽逻辑回归和Cox回归中每个变量十个事件的规则。
Am J Epidemiol. 2007 Mar 15;165(6):710-8. doi: 10.1093/aje/kwk052. Epub 2006 Dec 20.
3
A simulation study of sample size demonstrated the importance of the number of events per variable to develop prediction models in clustered data.一项样本量的模拟研究表明,在聚类数据中开发预测模型时,每个变量的事件数对于模型的重要性。
J Clin Epidemiol. 2015 Dec;68(12):1406-14. doi: 10.1016/j.jclinepi.2015.02.002. Epub 2015 Feb 14.
4
Sample size for binary logistic prediction models: Beyond events per variable criteria.二项逻辑预测模型的样本量:超越变量标准的事件数。
Stat Methods Med Res. 2019 Aug;28(8):2455-2474. doi: 10.1177/0962280218784726. Epub 2018 Jul 3.
5
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis.二元逻辑回归分析中每10个事件对应1个变量的标准没有理论依据。
BMC Med Res Methodol. 2016 Nov 24;16(1):163. doi: 10.1186/s12874-016-0267-3.
6
Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure.逻辑回归模型的性能:超越每个变量的事件数,数据结构的作用。
J Clin Epidemiol. 2011 Sep;64(9):993-1000. doi: 10.1016/j.jclinepi.2010.11.012. Epub 2011 Mar 16.
7
Shrinkage methods enhanced the accuracy of parameter estimation using Cox models with small number of events.收缩方法提高了使用 Cox 模型进行参数估计的准确性,Cox 模型的事件数量较少。
J Clin Epidemiol. 2013 Jul;66(7):743-51. doi: 10.1016/j.jclinepi.2013.02.002. Epub 2013 Apr 6.
8
A simulation study of the number of events per variable in logistic regression analysis.逻辑回归分析中每个变量事件数的模拟研究。
J Clin Epidemiol. 1996 Dec;49(12):1373-9. doi: 10.1016/s0895-4356(96)00236-3.
9
Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models.每个变量的事件数(EPV)以及评估逻辑回归模型样本外有效性的不同策略的相对性能。
Stat Methods Med Res. 2017 Apr;26(2):796-808. doi: 10.1177/0962280214558972. Epub 2014 Nov 19.
10
The number of primary events per variable affects estimation of the subdistribution hazard competing risks model.每个变量的主要事件数量会影响竞争风险亚分布风险模型的估计。
J Clin Epidemiol. 2017 Mar;83:75-84. doi: 10.1016/j.jclinepi.2016.11.017. Epub 2017 Jan 12.

引用本文的文献

1
Predictive models for chemotherapy-induced oral mucositis: a systematic review.化疗引起的口腔黏膜炎的预测模型:一项系统综述
Front Oncol. 2025 Aug 20;15:1608505. doi: 10.3389/fonc.2025.1608505. eCollection 2025.
2
Clinical and demographic predictors of the need for pharmacotherapy in neonatal abstinence syndrome.新生儿戒断综合征药物治疗需求的临床和人口统计学预测因素
Front Pediatr. 2025 Aug 11;13:1527276. doi: 10.3389/fped.2025.1527276. eCollection 2025.
3
Risk Factor and Prediction Model for Malignant Transformation in Pancreatic Intraductal Papillary Mucinous Neoplasm.

本文引用的文献

1
Shrinkage methods enhanced the accuracy of parameter estimation using Cox models with small number of events.收缩方法提高了使用 Cox 模型进行参数估计的准确性,Cox 模型的事件数量较少。
J Clin Epidemiol. 2013 Jul;66(7):743-51. doi: 10.1016/j.jclinepi.2013.02.002. Epub 2013 Apr 6.
2
Logistic regression modeling and the number of events per variable: selection bias dominates.逻辑回归建模与每个变量的事件数量:选择偏倚占主导地位。
J Clin Epidemiol. 2011 Dec;64(12):1464-5; author reply 1463-4. doi: 10.1016/j.jclinepi.2011.06.016.
3
An evaluation of penalised survival methods for developing prognostic models with rare events.
胰腺导管内乳头状黏液性肿瘤恶性转化的危险因素及预测模型
Cancer Med. 2025 Sep;14(17):e71182. doi: 10.1002/cam4.71182.
4
Reconsidering the role of IL-18BP in MASH: methodological perspectives and clarification of prior findings.重新审视IL-18BP在非酒精性脂肪性肝炎中的作用:方法学观点及对既往研究结果的阐释
Hepatol Int. 2025 Aug 8. doi: 10.1007/s12072-025-10889-1.
5
Determinants of visual functions in patients with early and intermediate age-related macular degeneration: the PEONY study.早中期年龄相关性黄斑变性患者视觉功能的决定因素:芍药研究
Eye (Lond). 2025 Jul 21. doi: 10.1038/s41433-025-03931-x.
6
Performance and Prognostic Relevance of Lymph Node Assessment by One-Step Nucleic Acid Amplification Assay in Rectal Cancer: A Multicenter Study.一步核酸扩增检测评估直肠癌淋巴结的性能及预后相关性:一项多中心研究
Cancers (Basel). 2025 Jun 25;17(13):2141. doi: 10.3390/cancers17132141.
7
Association of hyperuricemia with higher miscarriage rates and lower live birth rates in women undergoing IVF/ICSI.接受体外受精/卵胞浆内单精子注射的女性中,高尿酸血症与较高的流产率和较低的活产率相关。
J Ovarian Res. 2025 Jul 3;18(1):142. doi: 10.1186/s13048-025-01720-4.
8
Retention patterns of the public sector nursing and midwifery workforce in regional and rural settings of southern Queensland, Australia: a 12-year retrospective analysis.澳大利亚昆士兰州南部地区及农村地区公共部门护理和助产劳动力的留存模式:一项12年回顾性分析
BMC Nurs. 2025 Jul 1;24(1):722. doi: 10.1186/s12912-025-03324-1.
9
Exploring patient characteristics and respiratory impacts of pulmonary melioidosis: A 5-year experience from endemic region of Thailand.探索肺类鼻疽病的患者特征及对呼吸的影响:来自泰国流行地区的5年经验。
PLoS Negl Trop Dis. 2025 Jun 25;19(6):e0013222. doi: 10.1371/journal.pntd.0013222. eCollection 2025 Jun.
10
Prediction models for intraventricular hemorrhage in very preterm infants: a systematic review.极早产儿脑室内出血的预测模型:一项系统综述
Front Pediatr. 2025 Jun 4;13:1605145. doi: 10.3389/fped.2025.1605145. eCollection 2025.
评估惩罚生存方法在罕见事件预后模型中的应用。
Stat Med. 2012 May 20;31(11-12):1150-61. doi: 10.1002/sim.4371. Epub 2011 Oct 14.
4
Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure.逻辑回归模型的性能:超越每个变量的事件数,数据结构的作用。
J Clin Epidemiol. 2011 Sep;64(9):993-1000. doi: 10.1016/j.jclinepi.2010.11.012. Epub 2011 Mar 16.
5
A comparative study of the bias corrected estimates in logistic regression.逻辑回归中偏差校正估计的比较研究
Stat Methods Med Res. 2008 Dec;17(6):621-34. doi: 10.1177/0962280207084156. Epub 2008 Mar 28.
6
Relaxing the rule of ten events per variable in logistic and Cox regression.放宽逻辑回归和Cox回归中每个变量十个事件的规则。
Am J Epidemiol. 2007 Mar 15;165(6):710-8. doi: 10.1093/aje/kwk052. Epub 2006 Dec 20.
7
The design of simulation studies in medical statistics.医学统计学中的模拟研究设计
Stat Med. 2006 Dec 30;25(24):4279-92. doi: 10.1002/sim.2673.
8
Explained randomness in proportional hazards models.比例风险模型中的解释性随机性。
Stat Med. 2005 Feb 15;24(3):479-89. doi: 10.1002/sim.1946.
9
A new measure of prognostic separation in survival data.生存数据中预后分离的一种新度量方法。
Stat Med. 2004 Mar 15;23(5):723-48. doi: 10.1002/sim.1621.
10
A solution to the problem of separation in logistic regression.逻辑回归中分离问题的一种解决方案。
Stat Med. 2002 Aug 30;21(16):2409-19. doi: 10.1002/sim.1047.