用于微生物群落组成聚类的狄利克雷树多项混合模型

DIRICHLET-TREE MULTINOMIAL MIXTURES FOR CLUSTERING MICROBIOME COMPOSITIONS.

作者信息

Mao Jialiang, Ma L I

机构信息

Department of Statistical Science, Duke University.

出版信息

Ann Appl Stat. 2022 Sep;16(3):1476-1499. doi: 10.1214/21-aoas1552. Epub 2022 Jul 19.

DOI:10.1214/21-aoas1552

PMID:36127929

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9484567/

Abstract

Studying the human microbiome has gained substantial interest in recent years, and a common task in the analysis of these data is to cluster microbiome compositions into subtypes. This subdivision of samples into subgroups serves as an intermediary step in achieving personalized diagnosis and treatment. In applying existing clustering methods to modern microbiome studies including the American Gut Project (AGP) data, we found that this seemingly standard task, however, is very challenging in the microbiome composition context due to several key features of such data. Standard distance-based clustering algorithms generally do not produce reliable results as they do not take into account the heterogeneity of the cross-sample variability among the bacterial taxa, while existing model-based approaches do not allow sufficient flexibility for the identification of complex within-cluster variation from cross-cluster variation. Direct applications of such methods generally lead to overly dispersed clusters in the AGP data and such phenomenon is common for other microbiome data. To overcome these challenges, we introduce Dirichlet-tree multinomial mixtures (DTMM) as a Bayesian generative model for clustering amplicon sequencing data in microbiome studies. DTMM models the microbiome population with a mixture of Dirichlet-tree kernels that utilizes the phylogenetic tree to offer a more flexible covariance structure in characterizing within-cluster variation, and it provides a means for identifying a subset of signature taxa that distinguish the clusters. We perform extensive simulation studies to evaluate the performance of DTMM and compare it to state-of-the-art model-based and distance-based clustering methods in the microbiome context, and carry out a validation study on a publicly available longitudinal data set to confirm the biological relevance of the clusters. Finally, we report a case study on the fecal data from the AGP to identify compositional clusters among individuals with inflammatory bowel disease and diabetes. Among our most interesting findings is that enterotypes (i.e., gut microbiome clusters) are not always defined by the most dominant species as previous analyses had assumed, but can involve a number of less abundant OTUs, which cannot be identified with existing distance-based and method-based approaches.

摘要

近年来，对人类微生物组的研究引起了广泛关注，而分析这些数据的一项常见任务是将微生物组组成聚类为不同亚型。将样本细分为亚组是实现个性化诊断和治疗的中间步骤。在将现有聚类方法应用于包括美国肠道项目（AGP）数据在内的现代微生物组研究时，我们发现，由于此类数据的几个关键特征，在微生物组组成背景下，这项看似标准的任务极具挑战性。基于标准距离的聚类算法通常无法产生可靠的结果，因为它们没有考虑细菌分类群之间跨样本变异性的异质性，而现有的基于模型的方法在从跨聚类变异中识别复杂的聚类内变异时，灵活性不足。直接应用这些方法通常会导致AGP数据中的聚类过度分散，这种现象在其他微生物组数据中也很常见。为了克服这些挑战，我们引入了狄利克雷树多项混合模型（DTMM），作为微生物组研究中用于聚类扩增子测序数据的贝叶斯生成模型。DTMM使用狄利克雷树核的混合来对微生物组群体进行建模，该模型利用系统发育树在表征聚类内变异时提供更灵活的协方差结构，并提供了一种识别区分聚类的特征分类群子集的方法。我们进行了广泛的模拟研究，以评估DTMM的性能，并将其与微生物组背景下基于模型和基于距离的最新聚类方法进行比较，并在一个公开可用的纵向数据集上进行了验证研究，以确认聚类的生物学相关性。最后，我们报告了一项关于AGP粪便数据的案例研究，以识别炎症性肠病和糖尿病个体之间的组成聚类。我们最有趣的发现之一是，肠型（即肠道微生物组聚类）并不总是像之前的分析所假设的那样由最占优势的物种定义，而是可能涉及许多丰度较低的操作分类单元（OTU），而现有基于距离和基于方法的方法无法识别这些OTU。

相似文献

DIRICHLET-TREE MULTINOMIAL MIXTURES FOR CLUSTERING MICROBIOME COMPOSITIONS.

Ann Appl Stat. 2022 Sep;16(3):1476-1499. doi: 10.1214/21-aoas1552. Epub 2022 Jul 19.

Sparse tree-based clustering of microbiome data to characterize microbiome heterogeneity in pancreatic cancer.

J R Stat Soc Ser C Appl Stat. 2023 Jan;72(1):20-36. doi: 10.1093/jrsssc/qlac002. Epub 2023 Feb 13.

Bayesian biclustering for microbial metagenomic sequencing data via multinomial matrix factorization.

Biostatistics. 2022 Jul 18;23(3):891-909. doi: 10.1093/biostatistics/kxab002.

A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions.

mSphere. 2017 Dec 13;2(6). doi: 10.1128/mSphereDirect.00536-17. eCollection 2017 Nov-Dec.

tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data.

Front Genet. 2021 Dec 7;12:766405. doi: 10.3389/fgene.2021.766405. eCollection 2021.

An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data.

BMC Bioinformatics. 2017 Feb 8;18(1):94. doi: 10.1186/s12859-017-1516-0.

Stochastic variational variable selection for high-dimensional microbiome data.

Microbiome. 2022 Dec 24;10(1):236. doi: 10.1186/s40168-022-01439-0.

Microbiome subcommunity learning with logistic-tree normal latent Dirichlet allocation.

Biometrics. 2023 Sep;79(3):2321-2332. doi: 10.1111/biom.13772. Epub 2022 Oct 28.

A Dirichlet-tree multinomial regression model for associating dietary nutrients with gut microorganisms.

Biometrics. 2017 Sep;73(3):792-801. doi: 10.1111/biom.12654. Epub 2017 Jan 23.

MicroBVS: Dirichlet-tree multinomial regression models with Bayesian variable selection - an R package.

BMC Bioinformatics. 2020 Jul 13;21(1):301. doi: 10.1186/s12859-020-03640-0.

引用本文的文献

Credible inferences in microbiome research: ensuring rigour, reproducibility and relevance in the era of AI.

Nat Rev Gastroenterol Hepatol. 2025 Jul 31. doi: 10.1038/s41575-025-01100-9.

Multivariate Poisson lognormal distribution for modeling counts from modern biological data: An overview.

Comput Struct Biotechnol J. 2025 Mar 20;27:1255-1264. doi: 10.1016/j.csbj.2025.03.017. eCollection 2025.

Analysis of Microbiome Data.

Annu Rev Stat Appl. 2024 Apr;11(1):483-504. doi: 10.1146/annurev-statistics-040522-120734. Epub 2023 Oct 13.

Updating Urinary Microbiome Analyses to Enhance Biologic Interpretation.

Front Cell Infect Microbiol. 2022 Jul 8;12:789439. doi: 10.3389/fcimb.2022.789439. eCollection 2022.

本文引用的文献

American Gut: an Open Platform for Citizen Science Microbiome Research.

mSystems. 2018 May 15;3(3). doi: 10.1128/mSystems.00031-18. eCollection 2018 May-Jun.

Enterotypes in the landscape of gut microbial community composition.

Nat Microbiol. 2018 Jan;3(1):8-16. doi: 10.1038/s41564-017-0072-8. Epub 2017 Dec 18.

Exact sequence variants should replace operational taxonomic units in marker-gene data analysis.

ISME J. 2017 Dec;11(12):2639-2643. doi: 10.1038/ismej.2017.119. Epub 2017 Jul 21.

A Dirichlet-tree multinomial regression model for associating dietary nutrients with gut microorganisms.

Biometrics. 2017 Sep;73(3):792-801. doi: 10.1111/biom.12654. Epub 2017 Jan 23.

Introducing the Microbiome into Precision Medicine.

Trends Pharmacol Sci. 2017 Jan;38(1):81-91. doi: 10.1016/j.tips.2016.10.001. Epub 2016 Nov 1.

DADA2: High-resolution sample inference from Illumina amplicon data.

Nat Methods. 2016 Jul;13(7):581-3. doi: 10.1038/nmeth.3869. Epub 2016 May 23.

Context and the human microbiome.

Microbiome. 2015 Nov 4;3:52. doi: 10.1186/s40168-015-0117-2.

The microbiome in inflammatory bowel disease: current status and the future ahead.

Gastroenterology. 2014 May;146(6):1489-99. doi: 10.1053/j.gastro.2014.02.009. Epub 2014 Feb 19.

Gut metagenome in European women with normal, impaired and diabetic glucose control.

Nature. 2013 Jun 6;498(7452):99-103. doi: 10.1038/nature12198. Epub 2013 May 29.

A guide to enterotypes across the human body: meta-analysis of microbial community structures in human microbiome datasets.

PLoS Comput Biol. 2013;9(1):e1002863. doi: 10.1371/journal.pcbi.1002863. Epub 2013 Jan 10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于微生物群落组成聚类的狄利克雷树多项混合模型

DIRICHLET-TREE MULTINOMIAL MIXTURES FOR CLUSTERING MICROBIOME COMPOSITIONS.

作者信息

Mao Jialiang, Ma L I

机构信息

Department of Statistical Science, Duke University.

出版信息

Ann Appl Stat. 2022 Sep;16(3):1476-1499. doi: 10.1214/21-aoas1552. Epub 2022 Jul 19.

DOI:10.1214/21-aoas1552

PMID:36127929

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9484567/

Abstract

摘要

用于微生物群落组成聚类的狄利克雷树多项混合模型

DIRICHLET-TREE MULTINOMIAL MIXTURES FOR CLUSTERING MICROBIOME COMPOSITIONS.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

用于微生物群落组成聚类的狄利克雷树多项混合模型

DIRICHLET-TREE MULTINOMIAL MIXTURES FOR CLUSTERING MICROBIOME COMPOSITIONS.

作者信息

机构信息

出版信息