Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, Gothenburg, Sweden.
Computational Systems Biology, Chalmers University of Technology, Gothenburg, Sweden.
Stat Methods Med Res. 2019 Dec;28(12):3712-3728. doi: 10.1177/0962280218811354. Epub 2018 Nov 25.
Metagenomics enables the study of gene abundances in complex mixtures of microorganisms and has become a standard methodology for the analysis of the human microbiome. However, gene abundance data is inherently noisy and contains high levels of biological and technical variability as well as an excess of zeros due to non-detected genes. This makes the statistical analysis challenging. In this study, we present a new hierarchical Bayesian model for inference of metagenomic gene abundance data. The model uses a zero-inflated overdispersed Poisson distribution which is able to simultaneously capture the high gene-specific variability as well as zero observations in the data. By analysis of three comprehensive datasets, we show that zero-inflation is common in metagenomic data from the human gut and, if not correctly modelled, it can lead to substantial reductions in statistical power. We also show, by using resampled metagenomic data, that our model has, compared to other methods, a higher and more stable performance for detecting differentially abundant genes. We conclude that proper modelling of the gene-specific variability, including the excess of zeros, is necessary to accurately describe gene abundances in metagenomic data. The proposed model will thus pave the way for new biological insights into the structure of microbial communities.
宏基因组学能够研究复杂微生物混合物中的基因丰度,已成为分析人类微生物组的标准方法。然而,基因丰度数据本质上是嘈杂的,并且包含高水平的生物和技术可变性,以及由于未检测到基因而导致的过多零值。这使得统计分析具有挑战性。在这项研究中,我们提出了一种新的层次贝叶斯模型,用于推断宏基因组基因丰度数据。该模型使用零膨胀过离散泊松分布,能够同时捕捉数据中高基因特异性变异性和零观测值。通过对三个综合数据集的分析,我们表明零膨胀在人类肠道的宏基因组数据中很常见,如果不正确建模,它会导致统计功效大幅降低。我们还通过使用重采样的宏基因组数据表明,与其他方法相比,我们的模型在检测差异丰度基因方面具有更高和更稳定的性能。我们得出结论,正确建模基因特异性变异性,包括零过多,对于准确描述宏基因组数据中的基因丰度是必要的。因此,所提出的模型将为深入了解微生物群落的结构提供新的生物学见解。