整合系统发育信息的微生物组数据的转化和差异丰度分析。

Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, 200240 Shanghai, China.

SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, 200240 Shanghai, China.

Bioinformatics. 2021 Dec 11;37(24):4652-4660. doi: 10.1093/bioinformatics/btab543.

MOTIVATION

Microbiome data have proven extremely useful for understanding microbial communities and their impacts in health and disease. Although microbiome analysis methods and standards are evolving rapidly, obtaining meaningful and interpretable results from microbiome studies still requires careful statistical treatment. In particular, many existing and emerging methods for differential abundance (DA) analysis fail to account for the fact that microbiome data are high-dimensional and sparse, compositional, negatively and positively correlated and phylogenetically structured. To better describe microbiome data and improve the power of DA testing, there is still a great need for the continued development of appropriate statistical methodology.

RESULTS

In this article, we propose a model-based approach for microbiome data transformation, and a phylogenetically informed procedure for DA testing based on the transformed data. First, we extend the Dirichlet-tree multinomial (DTM) to zero-inflated DTM for multivariate modeling of microbial counts, addressing data sparsity and correlation and phylogeny among bacterial taxa. Then, within this framework and using a Bayesian formulation, we introduce posterior mean transformation to convert raw counts into non-zero relative abundances that sum to one, accounting for the compositionality nature of microbiome data. Second, using the transformed data, we propose adaptive analysis of composition of microbiomes (adaANCOM) for DA testing by constructing log-ratios adaptively on the tree for each taxon, greatly reducing the computational complexity of ANCOM in high dimensions. Finally, we present extensive simulation studies, an analysis of HMP data across 18 body sites and 2 visits, and an application to a gut microbiome and malnutrition study, to investigate the performance of posterior mean transformation and adaANCOM. Comparisons with ANCOM and other DA testing procedures show that adaANCOM controls the false discovery rate well, allows for easy interpretation of the results, and is computationally efficient for high-dimensional problems.

AVAILABILITY AND IMPLEMENTATION

The developed R package is available at https://github.com/ZRChao/adaANCOM. For replicability purposes, scripts for our simulations and data analysis are available at https://github.com/ZRChao/Papers_supplementary.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

动机

微生物组数据已被证明在理解微生物群落及其在健康和疾病中的影响方面非常有用。尽管微生物组分析方法和标准正在迅速发展，但要从微生物组研究中获得有意义和可解释的结果，仍然需要仔细的统计处理。特别是，许多现有的和新兴的差异丰度（DA）分析方法未能考虑到微生物组数据是高维的、稀疏的、组成的、负相关的和系统发育结构的事实。为了更好地描述微生物组数据并提高 DA 测试的功效，仍然需要不断开发适当的统计方法。

结果

在本文中，我们提出了一种基于模型的微生物组数据转换方法，以及一种基于转换后数据的系统发育信息 DA 测试方法。首先，我们将 Dirichlet-tree 多项分布（DTM）扩展到零膨胀 DTM，以对微生物计数进行多元建模，解决了数据稀疏性以及细菌分类单元之间的相关性和系统发育问题。然后，在这个框架内，并使用贝叶斯公式，我们引入了后验均值转换，将原始计数转换为非零的相对丰度，这些丰度之和为一，考虑到微生物组数据的组成性质。其次，使用转换后的数据，我们通过为每个分类单元在树上自适应地构建对数比，提出了微生物组组成的自适应分析（adaANCOM）来进行 DA 测试，大大降低了 ANCOM 在高维中的计算复杂度。最后，我们进行了广泛的模拟研究、对来自 18 个身体部位和 2 次访问的 HMP 数据进行了分析，并将其应用于肠道微生物组和营养不良研究，以研究后验均值转换和 adaANCOM 的性能。与 ANCOM 和其他 DA 测试程序的比较表明，adaANCOM 能很好地控制假发现率，使结果易于解释，并且对于高维问题计算效率高。

可用性和实施

开发的 R 包可在 https://github.com/ZRChao/adaANCOM 上获得。为了可复制性目的，我们的模拟和数据分析脚本可在 https://github.com/ZRChao/Papers_supplementary 上获得。

补充信息

补充数据可在生物信息学在线获得。

相似文献

Transformation and differential abundance analysis of microbiome data incorporating phylogeny.

Bioinformatics. 2021 Dec 11;37(24):4652-4660. doi: 10.1093/bioinformatics/btab543.

An empirical Bayes approach to normalization and differential abundance testing for microbiome data.

BMC Bioinformatics. 2020 Jun 3;21(1):225. doi: 10.1186/s12859-020-03552-z.

fastANCOM: a fast method for analysis of compositions of microbiomes.

Bioinformatics. 2022 Mar 28;38(7):2039-2041. doi: 10.1093/bioinformatics/btac060.

Analysis of composition of microbiomes: a novel method for studying microbial composition.

Microb Ecol Health Dis. 2015 May 29;26:27663. doi: 10.3402/mehd.v26.27663. eCollection 2015.

Batch effects correction for microbiome data with Dirichlet-multinomial regression.

Bioinformatics. 2019 Mar 1;35(5):807-814. doi: 10.1093/bioinformatics/bty729.

A novel normalization and differential abundance test framework for microbiome data.

Bioinformatics. 2020 Jul 1;36(13):3959-3965. doi: 10.1093/bioinformatics/btaa255.

LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control.

Proc Natl Acad Sci U S A. 2022 Jul 26;119(30):e2122788119. doi: 10.1073/pnas.2122788119. Epub 2022 Jul 22.

Sparse least trimmed squares regression with compositional covariates for high-dimensional data.

Bioinformatics. 2021 Nov 5;37(21):3805-3814. doi: 10.1093/bioinformatics/btab572.

pldist: ecological dissimilarities for paired and longitudinal microbiome association analysis.

Bioinformatics. 2019 Oct 1;35(19):3567-3575. doi: 10.1093/bioinformatics/btz120.

An adaptive direction-assisted test for microbiome compositional data.

Bioinformatics. 2022 Jul 11;38(14):3493-3500. doi: 10.1093/bioinformatics/btac361.

引用本文的文献

Analyzing microbiome data with taxonomic misclassification using a zero-inflated Dirichlet-multinomial model.

BMC Bioinformatics. 2025 Feb 27;26(1):69. doi: 10.1186/s12859-025-06078-4.

Assessing the effect of model specification and prior sensitivity on Bayesian tests of temporal signal.

PLoS Comput Biol. 2024 Nov 6;20(11):e1012371. doi: 10.1371/journal.pcbi.1012371. eCollection 2024 Nov.

fastCCLasso: a fast and efficient algorithm for estimating correlation matrix from compositional data.

Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae314.

mi-Mic: a novel multi-layer statistical test for microbiota-disease associations.

Genome Biol. 2024 May 1;25(1):113. doi: 10.1186/s13059-024-03256-0.

Multiscale adaptive differential abundance analysis in microbial compositional data.

Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad178.

phyloMDA: an R package for phylogeny-aware microbiome data analysis.

BMC Bioinformatics. 2022 Jun 6;23(1):213. doi: 10.1186/s12859-022-04744-5.

tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data.

Front Genet. 2021 Dec 7;12:766405. doi: 10.3389/fgene.2021.766405. eCollection 2021.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Transformation and differential abundance analysis of microbiome data incorporating phylogeny.

Bioinformatics. 2021 Dec 11;37(24):4652-4660. doi: 10.1093/bioinformatics/btab543.

An empirical Bayes approach to normalization and differential abundance testing for microbiome data.

BMC Bioinformatics. 2020 Jun 3;21(1):225. doi: 10.1186/s12859-020-03552-z.

fastANCOM: a fast method for analysis of compositions of microbiomes.

Bioinformatics. 2022 Mar 28;38(7):2039-2041. doi: 10.1093/bioinformatics/btac060.

Analysis of composition of microbiomes: a novel method for studying microbial composition.

Microb Ecol Health Dis. 2015 May 29;26:27663. doi: 10.3402/mehd.v26.27663. eCollection 2015.

Batch effects correction for microbiome data with Dirichlet-multinomial regression.

Bioinformatics. 2019 Mar 1;35(5):807-814. doi: 10.1093/bioinformatics/bty729.

A novel normalization and differential abundance test framework for microbiome data.

Bioinformatics. 2020 Jul 1;36(13):3959-3965. doi: 10.1093/bioinformatics/btaa255.

LOCOM: A logistic regression model for testing differential abundance in compositional microbiome data with false discovery rate control.

Proc Natl Acad Sci U S A. 2022 Jul 26;119(30):e2122788119. doi: 10.1073/pnas.2122788119. Epub 2022 Jul 22.

Sparse least trimmed squares regression with compositional covariates for high-dimensional data.

Bioinformatics. 2021 Nov 5;37(21):3805-3814. doi: 10.1093/bioinformatics/btab572.

pldist: ecological dissimilarities for paired and longitudinal microbiome association analysis.

Bioinformatics. 2019 Oct 1;35(19):3567-3575. doi: 10.1093/bioinformatics/btz120.

An adaptive direction-assisted test for microbiome compositional data.

Bioinformatics. 2022 Jul 11;38(14):3493-3500. doi: 10.1093/bioinformatics/btac361.

引用本文的文献

Analyzing microbiome data with taxonomic misclassification using a zero-inflated Dirichlet-multinomial model.

BMC Bioinformatics. 2025 Feb 27;26(1):69. doi: 10.1186/s12859-025-06078-4.

Assessing the effect of model specification and prior sensitivity on Bayesian tests of temporal signal.

PLoS Comput Biol. 2024 Nov 6;20(11):e1012371. doi: 10.1371/journal.pcbi.1012371. eCollection 2024 Nov.

fastCCLasso: a fast and efficient algorithm for estimating correlation matrix from compositional data.

Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae314.

mi-Mic: a novel multi-layer statistical test for microbiota-disease associations.

Genome Biol. 2024 May 1;25(1):113. doi: 10.1186/s13059-024-03256-0.

Multiscale adaptive differential abundance analysis in microbial compositional data.

Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad178.

phyloMDA: an R package for phylogeny-aware microbiome data analysis.

BMC Bioinformatics. 2022 Jun 6;23(1):213. doi: 10.1186/s12859-022-04744-5.

tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data.

Front Genet. 2021 Dec 7;12:766405. doi: 10.3389/fgene.2021.766405. eCollection 2021.

Transformation and differential abundance analysis of microbiome data incorporating phylogeny.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实施

补充信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献