Suppr超能文献

逻辑回归模型的性能:超越每个变量的事件数,数据结构的作用。

Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure.

机构信息

Faculty of Medicine, University of Geneva, Switzerland.

出版信息

J Clin Epidemiol. 2011 Sep;64(9):993-1000. doi: 10.1016/j.jclinepi.2010.11.012. Epub 2011 Mar 16.

Abstract

OBJECTIVE

Logistic regression is commonly used in health research, and it is important to be sure that the parameter estimates can be trusted. A common problem occurs when the outcome has few events; in such a case, parameter estimates may be biased or unreliable. This study examined the relation between correctness of estimation and several data characteristics: number of events per variable (EPV), number of predictors, percentage of predictors that are highly correlated, percentage of predictors that were non-null, size of regression coefficients, and size of correlations.

STUDY DESIGN

Simulation studies.

RESULTS

In many situations, logistic regression modeling may pose substantial problems even if the number of EPV exceeds 10. Moreover, the number of EPV is not the only element that impacts on the correctness of parameter estimation. High regression coefficients and high correlations between the predictors may cause large problems in the estimation process. Finally, power is generally very low, even at 20 EPV.

CONCLUSION

There is no single rule based on EPV that would guarantee an accurate estimation of logistic regression parameters. Instead, the number of predictors, probable size of the regression coefficients based on previous literature, and correlations among the predictors must be taken into account as guidelines to determine the necessary sample size.

摘要

目的

逻辑回归在健康研究中被广泛应用,确保参数估计的可信性非常重要。当结局事件较少时,会出现一个常见的问题,即参数估计可能会产生偏差或不可靠。本研究探讨了正确估计与多个数据特征之间的关系:每个变量的事件数(EPV)、预测因子的数量、高度相关的预测因子的百分比、非零预测因子的百分比、回归系数的大小以及相关性的大小。

研究设计

模拟研究。

结果

在许多情况下,即使 EPV 超过 10,逻辑回归建模也可能会带来很大的问题。此外,EPV 的数量并不是唯一影响参数估计正确性的因素。高回归系数和预测因子之间的高度相关性可能会导致估计过程中出现大问题。最后,即使 EPV 为 20,功效通常也非常低。

结论

没有基于 EPV 的单一规则可以保证逻辑回归参数的准确估计。相反,必须考虑预测因子的数量、基于先前文献的可能的回归系数大小以及预测因子之间的相关性,作为确定所需样本量的指导方针。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验