Suppr超能文献

二元逻辑回归分析中每10个事件对应1个变量的标准没有理论依据。

No rationale for 1 variable per 10 events criterion for binary logistic regression analysis.

作者信息

van Smeden Maarten, de Groot Joris A H, Moons Karel G M, Collins Gary S, Altman Douglas G, Eijkemans Marinus J C, Reitsma Johannes B

机构信息

Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, The Netherlands.

Centre for Statistics in Medicine, Botnar Research Centre, University of Oxford, Oxford, UK.

出版信息

BMC Med Res Methodol. 2016 Nov 24;16(1):163. doi: 10.1186/s12874-016-0267-3.

Abstract

BACKGROUND

Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies.

METHODS

The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth's correction, are compared.

RESULTS

The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect ('separation'). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth's correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation.

CONCLUSIONS

The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.

摘要

背景

每个变量十个事件(EPV)是逻辑回归分析中样本量考量方面广泛倡导的最低标准。在之前三项检验此最低EPV标准的模拟研究中,仅有一项支持使用至少10个EPV。在本文中,我们探究这些广泛模拟研究之间存在显著差异的原因。

方法

本研究采用蒙特卡洛模拟来评估小样本偏差、置信区间覆盖情况以及对数系数的均方误差。比较了通过最大似然法拟合的逻辑回归模型和一种改进的估计程序(称为费思校正)。

结果

结果表明,除了EPV之外,与低EPV相关的问题还取决于其他因素,如总样本量。还证明了模拟结果可能会被少数几个模拟数据集主导,在这些数据集中协变量对结果的预测是完美的(“分离”)。我们发现,识别和处理分离的不同方法会导致显著不同的模拟结果。我们进一步表明,费思校正可用于提高回归系数的准确性,并减轻与分离相关的问题。

结论

目前支持二元逻辑回归中EPV规则的证据不足。鉴于我们的研究结果,迫切需要开展新的研究,为二元逻辑回归分析的样本量考量提供指导。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7b7/5122171/2bd18a78815c/12874_2016_267_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验