Suppr超能文献

开发预测模型时,足够的样本量并非仅仅与每个变量的事件数相关。

Adequate sample size for developing prediction models is not simply related to events per variable.

作者信息

Ogundimu Emmanuel O, Altman Douglas G, Collins Gary S

机构信息

Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Diseases, Botnar Research Centre, University of Oxford, Windmill Road, Oxford OX3 7LD, UK.

Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology & Musculoskeletal Diseases, Botnar Research Centre, University of Oxford, Windmill Road, Oxford OX3 7LD, UK.

出版信息

J Clin Epidemiol. 2016 Aug;76:175-82. doi: 10.1016/j.jclinepi.2016.02.031. Epub 2016 Mar 8.

Abstract

OBJECTIVES

The choice of an adequate sample size for a Cox regression analysis is generally based on the rule of thumb derived from simulation studies of a minimum of 10 events per variable (EPV). One simulation study suggested scenarios in which the 10 EPV rule can be relaxed. The effect of a range of binary predictors with varying prevalence, reflecting clinical practice, has not yet been fully investigated.

STUDY DESIGN AND SETTING

We conducted an extended resampling study using a large general-practice data set, comprising over 2 million anonymized patient records, to examine the EPV requirements for prediction models with low-prevalence binary predictors developed using Cox regression. The performance of the models was then evaluated using an independent external validation data set. We investigated both fully specified models and models derived using variable selection.

RESULTS

Our results indicated that an EPV rule of thumb should be data driven and that EPV ≥ 20 ​ generally eliminates bias in regression coefficients when many low-prevalence predictors are included in a Cox model.

CONCLUSION

Higher EPV is needed when low-prevalence predictors are present in a model to eliminate bias in regression coefficients and improve predictive accuracy.

摘要

目的

Cox回归分析中合适样本量的选择通常基于经验法则,该法则源于对每个变量至少10个事件(EPV)的模拟研究。一项模拟研究提出了可以放宽10个EPV规则的情形。反映临床实践的一系列患病率不同的二元预测变量的影响尚未得到充分研究。

研究设计与设置

我们使用一个大型全科医疗数据集进行了一项扩展重采样研究,该数据集包含超过200万条匿名患者记录,以检验使用Cox回归开发的具有低患病率二元预测变量的预测模型的EPV要求。然后使用独立的外部验证数据集评估模型的性能。我们研究了完全指定的模型和使用变量选择得出的模型。

结果

我们的结果表明,EPV经验法则应基于数据驱动,并且当Cox模型中包含许多低患病率预测变量时,EPV≥20通常可消除回归系数中的偏差。

结论

当模型中存在低患病率预测变量时,需要更高的EPV来消除回归系数中的偏差并提高预测准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d813/5045274/83b3de32905f/gr1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验