Zandkarimi Eghbal, Moghimbeigi Abbas, Mahjub Hossein, Majdzadeh Reza
Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran.
Modeling of Noncommunicable Diseases Research Center, Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran.
J Appl Stat. 2019 Jul 2;47(2):287-305. doi: 10.1080/02664763.2019.1636942. eCollection 2020.
A popular way to model correlated count data with excess zeros and over-dispersion simultaneously is by means of the multilevel zero-inflated negative binomial (MZINB) distribution. Due to the complexity of the likelihood of these models, numerical methods such as the EM algorithm are used to estimate parameters. On the other hand, in the presence of outliers or when mixture components are poorly separated, the likelihood-based methods can become unstable. To overcome this challenge, we extend the robust expectation-solution (RES) approach for building a robust estimator of the regression parameters in the MZINB model. This approach achieves robustness by applying robust estimating equations in the S-step instead of estimating equations in the M-step of the EM algorithm. The robust estimation equation in the logistic component only weighs the design matrix (X) and reduces the effect of the leverage points, but in the negative binomial component, the influence of deviations on the response (Y) and design matrix (X) are bound separately. Simulation studies under various settings show that the RES algorithm gives us consistent estimates with smaller biases than the EM algorithm under contaminations. The RES algorithm applies to the data of the DMFT index and the fertility rate data.
一种同时对具有过多零值和过度离散的相关计数数据进行建模的常用方法是通过多级零膨胀负二项式(MZINB)分布。由于这些模型似然性的复杂性,诸如期望最大化(EM)算法等数值方法被用于估计参数。另一方面,在存在异常值或混合成分分离不佳的情况下,基于似然性的方法可能会变得不稳定。为了克服这一挑战,我们扩展了稳健期望 - 解(RES)方法,以构建MZINB模型中回归参数的稳健估计量。该方法通过在S步应用稳健估计方程而非EM算法M步中的估计方程来实现稳健性。逻辑斯蒂成分中的稳健估计方程仅对设计矩阵(X)进行加权,并减少杠杆点的影响,但在负二项式成分中,偏差对响应(Y)和设计矩阵(X)的影响是分别约束的。在各种设置下的模拟研究表明,RES算法在存在污染的情况下能给出比EM算法偏差更小的一致估计。RES算法适用于龋失补指数(DMFT)数据和生育率数据。