Faculty of Medicine, University of Geneva, Switzerland.
J Clin Epidemiol. 2011 Sep;64(9):993-1000. doi: 10.1016/j.jclinepi.2010.11.012. Epub 2011 Mar 16.
Logistic regression is commonly used in health research, and it is important to be sure that the parameter estimates can be trusted. A common problem occurs when the outcome has few events; in such a case, parameter estimates may be biased or unreliable. This study examined the relation between correctness of estimation and several data characteristics: number of events per variable (EPV), number of predictors, percentage of predictors that are highly correlated, percentage of predictors that were non-null, size of regression coefficients, and size of correlations.
Simulation studies.
In many situations, logistic regression modeling may pose substantial problems even if the number of EPV exceeds 10. Moreover, the number of EPV is not the only element that impacts on the correctness of parameter estimation. High regression coefficients and high correlations between the predictors may cause large problems in the estimation process. Finally, power is generally very low, even at 20 EPV.
There is no single rule based on EPV that would guarantee an accurate estimation of logistic regression parameters. Instead, the number of predictors, probable size of the regression coefficients based on previous literature, and correlations among the predictors must be taken into account as guidelines to determine the necessary sample size.
逻辑回归在健康研究中被广泛应用,确保参数估计的可信性非常重要。当结局事件较少时,会出现一个常见的问题,即参数估计可能会产生偏差或不可靠。本研究探讨了正确估计与多个数据特征之间的关系:每个变量的事件数(EPV)、预测因子的数量、高度相关的预测因子的百分比、非零预测因子的百分比、回归系数的大小以及相关性的大小。
模拟研究。
在许多情况下,即使 EPV 超过 10,逻辑回归建模也可能会带来很大的问题。此外,EPV 的数量并不是唯一影响参数估计正确性的因素。高回归系数和预测因子之间的高度相关性可能会导致估计过程中出现大问题。最后,即使 EPV 为 20,功效通常也非常低。
没有基于 EPV 的单一规则可以保证逻辑回归参数的准确估计。相反,必须考虑预测因子的数量、基于先前文献的可能的回归系数大小以及预测因子之间的相关性,作为确定所需样本量的指导方针。