Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.
Carolina Health Informatics Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.
Front Cell Infect Microbiol. 2021 Oct 25;11:734416. doi: 10.3389/fcimb.2021.734416. eCollection 2021.
Microbiome data are becoming increasingly available in large health cohorts, yet metabolomics data are still scant. While many studies generate microbiome data, they lack matched metabolomics data or have considerable missing proportions of metabolites. Since metabolomics is key to understanding microbial and general biological activities, the possibility of imputing individual metabolites or inferring metabolomics pathways from microbial taxonomy or metagenomics is intriguing. Importantly, current metabolomics profiling methods such as the HMP Unified Metabolic Analysis Network (HUMAnN) have unknown accuracy and are limited in their ability to predict individual metabolites. To address this gap, we developed a novel metabolite prediction method, and we present its application and evaluation in an oral microbiome study. The new method for predicting metabolites using microbiome data (ENVIM) is based on the elastic net model (ENM). ENVIM introduces an extra step to ENM to consider variable importance (VI) scores, and thus, achieves better prediction power. We investigate the metabolite prediction performance of ENVIM using metagenomic and metatranscriptomic data in a supragingival biofilm multi-omics dataset of 289 children ages 3-5 who were participants of a community-based study of early childhood oral health (ZOE 2.0) in North Carolina, United States. We further validate ENVIM in two additional publicly available multi-omics datasets generated from studies of gut health. We select gene family sets based on variable importance scores and modify the existing ENM strategy used in the MelonnPan prediction software to accommodate the unique features of microbiome and metabolome data. We evaluate metagenomic and metatranscriptomic predictors and compare the prediction performance of ENVIM to the standard ENM employed in MelonnPan. The newly developed ENVIM method showed superior metabolite predictive accuracy than MelonnPan when trained with metatranscriptomics data only, metagenomics data only, or both. Better metabolite prediction is achieved in the gut microbiome compared with the oral microbiome setting. We report the best-predictable compounds in all these three datasets from two different body sites. For example, the metabolites trehalose, maltose, stachyose, and ribose are all well predicted by the supragingival microbiome.
微生物组数据在大型健康队列中越来越多,但代谢组学数据仍然很少。虽然许多研究生成了微生物组数据,但它们缺乏匹配的代谢组学数据,或者代谢物的缺失比例相当大。由于代谢组学是理解微生物和一般生物活性的关键,因此从微生物分类学或宏基因组学推断个体代谢物或推断代谢途径的可能性很有吸引力。重要的是,目前的代谢组学分析方法,如 HMP 统一代谢分析网络 (HUMAnN),其准确性未知,并且在预测个体代谢物方面的能力有限。为了解决这一差距,我们开发了一种新的代谢物预测方法,并在一项口腔微生物组研究中介绍了其应用和评估。使用微生物组数据预测代谢物的新方法 (ENVIM) 基于弹性网络模型 (ENM)。ENVIM 在 ENM 中引入了一个额外的步骤来考虑变量重要性 (VI) 得分,从而实现了更好的预测能力。我们使用 289 名 3-5 岁儿童的龈上生物膜多组学数据集的宏基因组学和宏转录组学数据来研究 ENVIM 的代谢物预测性能,这些儿童是美国北卡罗来纳州一项儿童早期口腔健康社区研究 (ZOE 2.0) 的参与者。我们进一步在两个额外的公开可用的多组学数据集中验证 ENVIM,这些数据集是从肠道健康研究中生成的。我们根据变量重要性得分选择基因家族集,并修改现有的 MelonnPan 预测软件中使用的 ENM 策略,以适应微生物组和代谢组学数据的独特特征。我们评估宏基因组学和宏转录组学预测器,并将 ENVIM 的预测性能与 MelonnPan 中使用的标准 ENM 进行比较。当仅使用宏转录组学数据、仅使用宏基因组学数据或同时使用两种数据训练时,新开发的 ENVIM 方法显示出比 MelonnPan 更高的代谢物预测准确性。在肠道微生物组中比口腔微生物组设置实现了更好的代谢物预测。我们报告了来自两个不同身体部位的所有这三个数据集的最佳可预测化合物。例如,trehalose、maltose、stachyose 和 ribose 等代谢物都可以很好地预测龈上微生物组。