Kruppa Jochen, Hothorn Ludwig
Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology, Berlin, Germany.
Berlin Institute of Health (BIH), Berlin, Germany.
J Appl Stat. 2020 Jul 3;48(16):3220-3232. doi: 10.1080/02664763.2020.1788518. eCollection 2021.
Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered - e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.
在各个科学领域收集的数据都是计数数据。分析此类数据的一种方法是使用多重比较来比较因素处理的各个水平。然而,所测量的个体通常是聚类的——例如根据窝或饲养情况。在通过重复测量模型估计参数时必须考虑这一点。此外,忽略计数数据容易出现的过度离散会导致一类错误率增加。我们使用几种不同的数据设置进行模拟研究,并将不同的多重对比检验与广义估计方程和广义线性混合模型的参数估计进行比较,以观察覆盖率和拒绝概率。我们生成了在许多生物学环境中都能观察到的小样本中过度离散、聚类的计数数据。我们发现,如果正确指定方差-三明治估计量,广义估计方程的表现优于广义线性混合模型。此外,广义线性混合模型在某些数据设置下显示出收敛速度问题,但存在影响较小的模型实现。最后,我们使用一个遗传数据的例子来演示多重对比检验的应用以及忽略强过度离散的问题。