Zhang Xinyan, Guo Boyi, Yi Nengjun
Department of Statistics and Data Analytics, Kennesaw State University, Kennesaw, GA, United States of America.
Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL, United States of America.
PLoS One. 2020 Nov 9;15(11):e0242073. doi: 10.1371/journal.pone.0242073. eCollection 2020.
The human microbiome is variable and dynamic in nature. Longitudinal studies could explain the mechanisms in maintaining the microbiome in health or causing dysbiosis in disease. However, it remains challenging to properly analyze the longitudinal microbiome data from either 16S rRNA or metagenome shotgun sequencing studies, output as proportions or counts. Most microbiome data are sparse, requiring statistical models to handle zero-inflation. Moreover, longitudinal design induces correlation among the samples and thus further complicates the analysis and interpretation of the microbiome data.
In this article, we propose zero-inflated Gaussian mixed models (ZIGMMs) to analyze longitudinal microbiome data. ZIGMMs is a robust and flexible method which can be applicable for longitudinal microbiome proportion data or count data generated with either 16S rRNA or shotgun sequencing technologies. It can include various types of fixed effects and random effects and account for various within-subject correlation structures, and can effectively handle zero-inflation. We developed an efficient Expectation-Maximization (EM) algorithm to fit the ZIGMMs by taking advantage of the standard procedure for fitting linear mixed models. We demonstrate the computational efficiency of our EM algorithm by comparing with two other zero-inflated methods. We show that ZIGMMs outperform the previously used linear mixed models (LMMs), negative binomial mixed models (NBMMs) and zero-inflated Beta regression mixed model (ZIBR) in detecting associated effects in longitudinal microbiome data through extensive simulations. We also apply our method to two public longitudinal microbiome datasets and compare with LMMs and NBMMs in detecting dynamic effects of associated taxa.
人类微生物群在本质上是可变且动态的。纵向研究可以解释维持健康状态下微生物群或导致疾病中微生物群失调的机制。然而,对来自16S rRNA或宏基因组鸟枪法测序研究的纵向微生物群数据(以比例或计数形式输出)进行恰当分析仍然具有挑战性。大多数微生物群数据是稀疏的,需要统计模型来处理零膨胀问题。此外,纵向设计会导致样本之间产生相关性,从而进一步使微生物群数据的分析和解释变得复杂。
在本文中,我们提出了零膨胀高斯混合模型(ZIGMMs)来分析纵向微生物群数据。ZIGMMs是一种稳健且灵活的方法,可适用于通过16S rRNA或鸟枪法测序技术生成的纵向微生物群比例数据或计数数据。它可以包含各种类型的固定效应和随机效应,并考虑各种受试者内部的相关结构,且能有效处理零膨胀问题。我们开发了一种高效的期望最大化(EM)算法,通过利用拟合线性混合模型的标准程序来拟合ZIGMMs。我们通过与其他两种零膨胀方法进行比较,展示了我们的EM算法的计算效率。我们通过广泛的模拟表明,在检测纵向微生物群数据中的相关效应方面,ZIGMMs优于先前使用的线性混合模型(LMMs)、负二项混合模型(NBMMs)和零膨胀贝塔回归混合模型(ZIBR)。我们还将我们的方法应用于两个公开的纵向微生物群数据集,并在检测相关分类群的动态效应方面与LMMs和NBMMs进行比较。