Zong Yuxuan, Zhao Hongyu, Wang Tao
Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, Shanghai, China.
SJTU-Yale Joint Center of Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai, China.
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae205.
Potentially pathogenic or probiotic microbes can be identified by comparing their abundance levels between healthy and diseased populations, or more broadly, by linking microbiome composition with clinical phenotypes or environmental factors. However, in microbiome studies, feature tables provide relative rather than absolute abundance of each feature in each sample, as the microbial loads of the samples and the ratios of sequencing depth to microbial load are both unknown and subject to considerable variation. Moreover, microbiome abundance data are count-valued, often over-dispersed and contain a substantial proportion of zeros. To carry out differential abundance analysis while addressing these challenges, we introduce mbDecoda, a model-based approach for debiased analysis of sparse compositions of microbiomes. mbDecoda employs a zero-inflated negative binomial model, linking mean abundance to the variable of interest through a log link function, and it accommodates the adjustment for confounding factors. To efficiently obtain maximum likelihood estimates of model parameters, an Expectation Maximization algorithm is developed. A minimum coverage interval approach is then proposed to rectify compositional bias, enabling accurate and reliable absolute abundance analysis. Through extensive simulation studies and analysis of real-world microbiome datasets, we demonstrate that mbDecoda compares favorably with state-of-the-art methods in terms of effectiveness, robustness and reproducibility.
通过比较健康人群和患病群体之间潜在致病或益生菌微生物的丰度水平,或者更广泛地说,通过将微生物组组成与临床表型或环境因素联系起来,可以识别这些微生物。然而,在微生物组研究中,特征表提供的是每个样本中每个特征的相对丰度而非绝对丰度,因为样本的微生物载量以及测序深度与微生物载量的比率均未知且变化很大。此外,微生物组丰度数据是计数值,通常过度分散且包含相当比例的零值。为了在应对这些挑战的同时进行差异丰度分析,我们引入了mbDecoda,这是一种基于模型的方法,用于对微生物组的稀疏组成进行偏差校正分析。mbDecoda采用零膨胀负二项式模型,通过对数链接函数将平均丰度与感兴趣的变量联系起来,并对混杂因素进行调整。为了有效地获得模型参数的最大似然估计,开发了一种期望最大化算法。然后提出了一种最小覆盖区间方法来纠正组成偏差,从而实现准确可靠的绝对丰度分析。通过广泛的模拟研究和对真实世界微生物组数据集的分析,我们证明mbDecoda在有效性、稳健性和可重复性方面优于现有方法。