School of Statistics, 12630University of International Business and Economics, Beijing, China.
Department of Epidemiology, 25812School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, USA.
Stat Methods Med Res. 2022 Nov;31(11):2237-2254. doi: 10.1177/09622802221115881. Epub 2022 Jul 27.
Human microbiome research has become a hot-spot in health and medical research in the past decade due to the rapid development of modern high-throughput. Typical data in a microbiome study consisting of the operational taxonomic unit counts may have over-dispersion and/or structural zero issues. In such cases, negative binomial models can be applied to address the over-dispersion issue, while zero-inflated negative binomial models can be applied to address both issues. In practice, it is essential to know if there is zero-inflation in the data before applying negative binomial or zero-inflated negative binomial models because zero-inflated negative binomial models may be unnecessarily complex and difficult to interpret, or may even suffer from convergence issues if there is no zero-inflation in the data. On the other hand, negative binomial models may yield invalid inferences if the data does exhibit excessive zeros. In this paper, we develop a new test for detecting zero-inflation resulting from a latent class of subjects with structural zeros in a negative binomial regression model by directly comparing the amount of observed zeros with what would be expected under the negative binomial regression model. A closed form of the test statistic as well as its asymptotic properties are derived based on estimating equations. Intensive simulation studies are conducted to investigate the performance of the new test and compare it with the classical Wald, likelihood ratio, and score tests. The tests are also applied to human gut microbiome data to test latent class in microbial genera.
在过去的十年中,由于现代高通量技术的快速发展,人类微生物组研究已成为健康和医学研究的热点。微生物组研究中的典型数据(由操作分类单位计数组成)可能存在过度分散和/或结构零问题。在这种情况下,可以应用负二项式模型来解决过度分散问题,而零膨胀负二项式模型可以同时解决这两个问题。在实践中,在应用负二项式或零膨胀负二项式模型之前,必须知道数据中是否存在零膨胀,因为如果数据中没有零膨胀,零膨胀负二项式模型可能会变得过于复杂且难以解释,甚至可能会出现收敛问题。另一方面,如果数据确实存在过多的零值,那么负二项式模型可能会得出无效的推断。在本文中,我们通过直接比较负二项式回归模型中具有结构零的潜在类别的观察到的零值与负二项式回归模型下的预期零值,开发了一种用于检测零膨胀的新检验方法。基于估计方程推导出了检验统计量的封闭形式及其渐近性质。通过密集的模拟研究来研究新检验的性能,并将其与经典的 Wald、似然比和得分检验进行比较。还将这些检验应用于人类肠道微生物组数据,以检验微生物属中的潜在类别。