Suppr超能文献

基于多项矩阵分解的微生物宏基因组测序数据的贝叶斯双聚类分析。

Bayesian biclustering for microbial metagenomic sequencing data via multinomial matrix factorization.

机构信息

Department of Statistics, Texas A&M University, College Station, TX, USA and Institute of Statistics and Big Data, Renmin University of China, Beijing, China.

Center for Applied Statistics, Institute of Statistics and Big Data, Renmin University of China, Beijing, China.

出版信息

Biostatistics. 2022 Jul 18;23(3):891-909. doi: 10.1093/biostatistics/kxab002.

Abstract

High-throughput sequencing technology provides unprecedented opportunities to quantitatively explore human gut microbiome and its relation to diseases. Microbiome data are compositional, sparse, noisy, and heterogeneous, which pose serious challenges for statistical modeling. We propose an identifiable Bayesian multinomial matrix factorization model to infer overlapping clusters on both microbes and hosts. The proposed method represents the observed over-dispersed zero-inflated count matrix as Dirichlet-multinomial mixtures on which latent cluster structures are built hierarchically. Under the Bayesian framework, the number of clusters is automatically determined and available information from a taxonomic rank tree of microbes is naturally incorporated, which greatly improves the interpretability of our findings. We demonstrate the utility of the proposed approach by comparing to alternative methods in simulations. An application to a human gut microbiome data set involving patients with inflammatory bowel disease reveals interesting clusters, which contain bacteria families Bacteroidaceae, Bifidobacteriaceae, Enterobacteriaceae, Fusobacteriaceae, Lachnospiraceae, Ruminococcaceae, Pasteurellaceae, and Porphyromonadaceae that are known to be related to the inflammatory bowel disease and its subtypes according to biological literature. Our findings can help generate potential hypotheses for future investigation of the heterogeneity of the human gut microbiome.

摘要

高通量测序技术为定量探索人类肠道微生物组及其与疾病的关系提供了前所未有的机会。微生物组数据具有组成性、稀疏性、噪声性和异质性,这给统计建模带来了严峻的挑战。我们提出了一种可识别的贝叶斯多项矩阵分解模型,以推断微生物和宿主上的重叠簇。该方法将观察到的过离散零膨胀计数矩阵表示为狄利克雷-多项混合物,在该混合物上分层构建潜在的聚类结构。在贝叶斯框架下,自动确定簇的数量,并自然纳入微生物分类等级树的可用信息,这极大地提高了我们发现结果的可解释性。我们通过与模拟中的替代方法进行比较,展示了所提出方法的效用。对涉及炎症性肠病患者的人类肠道微生物组数据集的应用揭示了有趣的簇,其中包含已知与炎症性肠病及其亚型相关的细菌科,如拟杆菌科、双歧杆菌科、肠杆菌科、梭菌科、lachnospiraceae、瘤胃球菌科、巴斯德氏菌科和卟啉单胞菌科。我们的发现可以帮助为未来研究人类肠道微生物组的异质性生成潜在的假说。

相似文献

引用本文的文献

2
Multi-way overlapping clustering by Bayesian tensor decomposition.基于贝叶斯张量分解的多路重叠聚类
Stat Interface. 2024;17(2):219-230. doi: 10.4310/23-sii790. Epub 2024 Feb 1.
4
Analysis of Microbiome Data.微生物组数据分析
Annu Rev Stat Appl. 2024 Apr;11(1):483-504. doi: 10.1146/annurev-statistics-040522-120734. Epub 2023 Oct 13.
8
Healthcare Biclustering-Based Prediction on Gene Expression Dataset.基于医疗保健双聚类的基因表达数据集预测。
Biomed Res Int. 2022 Feb 22;2022:2263194. doi: 10.1155/2022/2263194. eCollection 2022.

本文引用的文献

2
Consensus Monte Carlo for Random Subsets using Shared Anchors.使用共享锚点的随机子集的共识蒙特卡罗方法。
J Comput Graph Stat. 2020;29(4):703-714. doi: 10.1080/10618600.2020.1737085. Epub 2020 Apr 15.
4
Scalable Bayesian Nonparametric Clustering and Classification.可扩展的贝叶斯非参数聚类与分类
J Comput Graph Stat. 2020;29(1):53-65. doi: 10.1080/10618600.2019.1624366. Epub 2019 Jul 19.
7
The vaginal microbiome and preterm birth.阴道微生物组与早产。
Nat Med. 2019 Jun;25(6):1012-1021. doi: 10.1038/s41591-019-0450-2. Epub 2019 May 29.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验