Hsu Wei-Wen, Mawella Nadeesha R, Todem David
Department of Statistics, Kansas State University, Manhattan, KS 66506, USA.
Department of Mathematics and Statistics, University of Missouri-Kansas City, Kansas City, MO 64110, USA.
Int Stat Rev. 2022 Apr;90(1):62-77. doi: 10.1111/insr.12462. Epub 2021 Jul 5.
In many applications of two-component mixture models such as the popular zero-inflated model for discrete-valued data, it is customary for the data analyst to evaluate the inherent heterogeneity in view of observed data. To this end, the score test, acclaimed for its simplicity, is routinely performed. It has long been recognized that this test may behave erratically under model misspecification, but the implications of this behavior remain poorly understood for popular two-component mixture models. For the special case of zero-inflated count models, we use data simulations and theoretical arguments to evaluate this behavior and discuss its implications in settings where the working model is restrictive with regard to the true data generating mechanism. We enrich this discussion with an analysis of count data in HIV research, where a one-component model is shown to fit the data reasonably well despite apparent extra zeros. These results suggest that a rejection of homogeneity does not imply that the underlying mixture model is appropriate. Rather, such a rejection simply implies that the mixture model should be carefully interpreted in the light of potential model misspecifications, and further evaluated against other competing models.
在双组分混合模型的许多应用中,例如用于离散值数据的流行的零膨胀模型,数据分析师通常会根据观测数据评估内在的异质性。为此,以其简单性而广受赞誉的得分检验经常被执行。长期以来,人们已经认识到,在模型误设的情况下,该检验可能表现不稳定,但对于流行的双组分混合模型,这种行为的影响仍知之甚少。对于零膨胀计数模型的特殊情况,我们使用数据模拟和理论论证来评估这种行为,并讨论其在工作模型对真实数据生成机制具有限制性的情况下的影响。我们通过对HIV研究中的计数数据进行分析来丰富这一讨论,其中一个单组分模型尽管存在明显的额外零值,但仍显示出对数据拟合得相当好。这些结果表明,对同质性的拒绝并不意味着潜在的混合模型是合适的。相反,这种拒绝仅仅意味着应该根据潜在的模型误设仔细解释混合模型,并针对其他竞争模型进行进一步评估。