Department of Statistics, University of Barishal, Barishal, Bangladesh.
Institute of Statistical Research and Training, University of Dhaka, Dhaka, Bangladesh.
Stat Med. 2019 Jun 30;38(14):2544-2560. doi: 10.1002/sim.8126. Epub 2019 Feb 22.
Generalized estimating equation (GEE) is a popular approach for analyzing correlated binary data. However, the problems of separation in GEE are still unknown. The separation created by a covariate often occurs in small correlated binary data and even in large data with rare outcome and/or high intra-cluster correlation and a number of influential covariates. This paper investigated the consequences of separation in GEE and addressed them by introducing a penalized GEE, termed as PGEE. The PGEE is obtained by adding Firth-type penalty term, which was originally proposed for generalized linear model score equation, to standard GEE and shown to achieve convergence and provide finite estimate of the regression coefficient in the presence of separation, which are not often possible in GEE. Further, a small-sample bias correction to the sandwich covariance estimator of the PGEE estimator is suggested. Simulations also showed that the GEE failed to achieve convergence and/or provided infinitely large estimate of the regression coefficient in the presence of complete or quasi-complete separation, whereas the PGEE showed significant improvement by achieving convergence and providing finite estimate. Even in the presence of near-to-separation, the PGEE also showed superior properties over the GEE. Furthermore, the bias-corrected sandwich estimator for the PGEE estimator showed substantial improvement over the standard sandwich estimator by reducing bias in estimating type I error rate. An illustration using real data also supported the findings of simulation. The PGEE with bias-corrected sandwich covariance estimator is recommended to use for small-to-moderate size sample (N ≤ 50) and even can be used for large sample if there is any evidence of separation or near-to-separation.
广义估计方程(GEE)是分析相关二项数据的常用方法。然而,GEE 中的分离问题仍未得到解决。协变量引起的分离在小相关二项数据中经常出现,甚至在罕见结局和/或高组内相关和大量有影响的协变量的大数据中也会出现。本文研究了 GEE 中分离的后果,并通过引入一种惩罚广义估计方程(PGEE)来解决这些问题。PGEE 通过在标准 GEE 中添加 Firth 型惩罚项来获得,该惩罚项最初是为广义线性模型得分方程提出的,在存在分离的情况下,它可以实现收敛并提供回归系数的有限估计,而在 GEE 中通常不可能实现。此外,建议对 PGEE 估计量的 sandwich 协方差估计量进行小样本偏置修正。模拟结果还表明,在存在完全或准完全分离的情况下,GEE 无法实现收敛,或者提供回归系数的无穷大估计,而 PGEE 通过实现收敛和提供有限估计显示出显著的改进。即使在接近分离的情况下,PGEE 也表现出优于 GEE 的特性。此外,偏置修正的 PGEE 估计量的 sandwich 估计量通过减少估计 I 型错误率的偏差,显示出对标准 sandwich 估计量的显著改进。使用真实数据的说明也支持了模拟的结果。建议在小到中等大小的样本(N≤50)中使用带偏置修正 sandwich 协方差估计量的 PGEE,即使存在任何分离或接近分离的证据,也可以在大样本中使用。