Srivastava Gopal, Brylinski Michal
Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.
Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA.
Nutrients. 2025 Jan 28;17(3):469. doi: 10.3390/nu17030469.
The human gut microbiome is critical for host health by facilitating essential metabolic processes. Our study presents a data-driven analysis across 312 bacterial species and 154 unique metabolites to enhance the understanding of underlying metabolic processes in gut bacteria. The focus of the study was to create a strategy to generate a theoretical (negative) set for binary classification models to predict the consumption and production of metabolites in the human gut microbiome. Our models achieved median balanced accuracies of 0.74 for consumption predictions and 0.95 for production predictions, highlighting the effectiveness of this approach in generating reliable negative sets. Additionally, we applied a kernel principal component analysis for dimensionality reduction. The consumption model with a polynomial kernel, and the production model with a radial basis function with 32 reduced features, showed median accuracies of 0.58 and 0.67, respectively. This demonstrates that biological information can still be captured, albeit with some loss, even after reducing the number of features. Furthermore, our models were validated on six previously unseen cases, achieving five correct predictions for consumption and four for production, demonstrating alignment with known biological outcomes. These findings highlight the potential of integrating data-driven approaches with machine learning techniques to enhance our understanding of gut microbiome metabolism. This work provides a foundation for creating bacteria-metabolite datasets to enhance machine learning-based predictive tools, with potential applications in developing therapeutic methods targeting gut microbes.
人类肠道微生物群通过促进基本代谢过程对宿主健康至关重要。我们的研究对312种细菌和154种独特代谢物进行了数据驱动分析,以加深对肠道细菌潜在代谢过程的理解。该研究的重点是创建一种策略,为二元分类模型生成理论(阴性)集,以预测人类肠道微生物群中代谢物的消耗和产生。我们的模型在消耗预测方面的中位数平衡准确率为0.74,在产生预测方面为0.95,突出了这种方法在生成可靠阴性集方面的有效性。此外,我们应用核主成分分析进行降维。具有多项式核的消耗模型和具有32个降维特征的径向基函数的产生模型,中位数准确率分别为0.58和0.67。这表明即使在减少特征数量之后,仍能捕获一些生物信息,尽管会有一些损失。此外,我们的模型在六个以前未见过的案例上进行了验证,在消耗方面实现了五次正确预测,在产生方面实现了四次正确预测,证明与已知生物学结果一致。这些发现突出了将数据驱动方法与机器学习技术相结合以加深我们对肠道微生物群代谢理解的潜力。这项工作为创建细菌-代谢物数据集以增强基于机器学习的预测工具奠定了基础,在开发针对肠道微生物的治疗方法方面具有潜在应用。