Fulton Kara A, Liu Danping, Haynie Denise L, Albert Paul S
Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, Maryland 20852 USA,
Ann Appl Stat. 2015;9(1):275-299. doi: 10.1214/14-AOAS791. Epub 2015 Apr 28.
The NEXT Generation Health study investigates the dating violence of adolescents using a survey questionnaire. Each student is asked to affirm or deny multiple instances of violence in his/her dating relationship. There is, however, evidence suggesting that students not in a relationship responded to the survey, resulting in excessive zeros in the responses. This paper proposes likelihood-based and estimating equation approaches to analyze the zero-inflated clustered binary response data. We adopt a mixed model method to account for the cluster effect, and the model parameters are estimated using a maximum-likelihood (ML) approach that requires a Gaussian-Hermite quadrature (GHQ) approximation for implementation. Since an incorrect assumption on the random effects distribution may bias the results, we construct generalized estimating equations (GEE) that do not require the correct specification of within-cluster correlation. In a series of simulation studies, we examine the performance of ML and GEE methods in terms of their bias, efficiency and robustness. We illustrate the importance of properly accounting for this zero inflation by reanalyzing the NEXT data where this issue has previously been ignored.
下一代健康研究使用调查问卷对青少年的约会暴力行为进行调查。要求每个学生确认或否认其恋爱关系中多次出现的暴力行为。然而,有证据表明,没有恋爱关系的学生也对调查做出了回应,导致回答中出现过多的零值。本文提出了基于似然性和估计方程的方法来分析零膨胀聚类二元响应数据。我们采用混合模型方法来考虑聚类效应,并使用最大似然(ML)方法估计模型参数,该方法需要高斯-埃尔米特求积(GHQ)近似来实现。由于对随机效应分布的错误假设可能会使结果产生偏差,我们构建了广义估计方程(GEE),该方程不需要正确指定聚类内相关性。在一系列模拟研究中,我们从偏差、效率和稳健性方面检验了ML和GEE方法的性能。我们通过重新分析之前忽略了这个问题的下一代健康研究数据,说明了正确考虑这种零膨胀的重要性。