Department of Statistical Science, Southern Methodist University, Dallas, TX 75275, United States.
Quantitative Biomedical Research Center, Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, TX 75390, United States.
J Mol Biol. 2022 Aug 15;434(15):167693. doi: 10.1016/j.jmb.2022.167693. Epub 2022 Jun 28.
Human microbiome consists of trillions of microorganisms. Microbiota can modulate the host physiology through molecule and metabolite interactions. Integrating microbiome and metabolomics data have the potential to predict different diseases more accurately. Yet, most datasets only measure microbiome data but without paired metabolome data. Here, we propose a novel integrative modeling framework, Microbiome-based Supervised Contrastive Learning Framework (MB-SupCon). MB-SupCon integrates microbiome and metabolome data to generate microbiome embeddings, which can be used to improve the prediction accuracy in datasets that only measure microbiome data. As a proof of concept, we applied MB-SupCon on 720 samples with paired 16S microbiome data and metabolomics data from patients with type 2 diabetes. MB-SupCon outperformed existing prediction methods and achieved high average prediction accuracies for insulin resistance status (84.62%), sex (78.98%), and race (80.04%). Moreover, the microbiome embeddings form separable clusters for different covariate groups in the lower-dimensional space, which enhances data visualization. We also applied MB-SupCon on a large inflammatory bowel disease study and observed similar advantages. Thus, MB-SupCon could be broadly applicable to improve microbiome prediction models in multi-omics disease studies.
人类微生物组由数万亿微生物组成。微生物群可以通过分子和代谢物的相互作用来调节宿主的生理机能。整合微生物组和代谢组学数据有潜力更准确地预测不同的疾病。然而,大多数数据集仅测量微生物组数据,但没有配对的代谢组数据。在这里,我们提出了一种新的整合建模框架,基于微生物组的监督对比学习框架(MB-SupCon)。MB-SupCon 整合了微生物组和代谢组数据,生成微生物组嵌入,可用于提高仅测量微生物组数据的数据集的预测准确性。作为概念验证,我们将 MB-SupCon 应用于 720 个样本,这些样本具有来自 2 型糖尿病患者的配对 16S 微生物组数据和代谢组数据。MB-SupCon 优于现有预测方法,对胰岛素抵抗状态(84.62%)、性别(78.98%)和种族(80.04%)的平均预测准确率较高。此外,在低维空间中,微生物组嵌入形成了不同协变量组的可分离聚类,增强了数据可视化。我们还在一项大型炎症性肠病研究中应用了 MB-SupCon,并观察到了类似的优势。因此,MB-SupCon 可以广泛应用于改善多组学疾病研究中的微生物组预测模型。