Silva-Andrade Claudia, Hernández Sergio, Saa Pedro, Perez-Rueda Ernesto, Garrido Daniel, Martin Alberto J
Programa de Doctorado en Genómica Integrativa, Vicerrectoria de investigación, Universidad Mayor, Santiago, Chile.
Laboratorio de Redes Biológicas, Centro Científico y Tecnológico de Excelencia Ciencia & Vida, Fundación Ciencia & Vida, Santiago, Chile.
PeerJ. 2025 May 28;13:e19296. doi: 10.7717/peerj.19296. eCollection 2025.
Understanding the behavior of microbial consortia is crucial for predicting metabolite production by microorganisms. Genome-scale network reconstructions enable the computation of metabolic interactions and specific associations within microbial consortia underpinning the production of different metabolites. In the context of the human gut, butyrate is a central metabolite produced by bacteria that plays a key role within the gut microbiome impacting human health. Despite its importance, there is a lack of computational methods capable of predicting its production as a function of the consortium composition. Here, we present a novel machine-learning approach leveraging automatically generated genome-scale metabolic models to tackle this limitation. Briefly, all consortia made of two up to 13 members from a pool of 19 bacteria with known genomes, including at least one butyrate producer from a pool of three known producer species, were built and their (maximum) butyrate production simulated. Using network-derived descriptors from each bacteria, butyrate production by the above consortia was used as training data for various machine learning models. The performance of the algorithms was evaluated using k-fold cross-validation and new experimental data, displaying a Pearson correlation coefficient exceeding 0.75 for the predicted and observed butyrate production in two bacteria consortia. While consortia with more than two bacteria showed generally worse predictions, the best machine-learning models still outperformed predictions from genome-scale metabolic models alone. Overall, this approach provides a valuable tool and framework for probing promising butyrate-producing consortia on a large scale, guiding experimentation, and more importantly, predicting metabolic production by consortia.
了解微生物群落的行为对于预测微生物产生的代谢物至关重要。基因组规模的网络重建能够计算微生物群落内的代谢相互作用和特定关联,这些相互作用和关联是不同代谢物产生的基础。在人体肠道环境中,丁酸盐是由细菌产生的一种核心代谢物,它在影响人类健康的肠道微生物群中起着关键作用。尽管其很重要,但目前缺乏能够根据群落组成预测其产量的计算方法。在此,我们提出了一种新颖的机器学习方法,利用自动生成的基因组规模代谢模型来克服这一局限性。简而言之,构建了由19种已知基因组的细菌组成的所有群落,这些群落由2个至13个成员组成,其中至少包括来自3种已知丁酸盐产生菌中的一种,并模拟了它们的(最大)丁酸盐产量。利用从每种细菌中提取的基于网络的描述符,将上述群落的丁酸盐产量用作各种机器学习模型的训练数据。使用k折交叉验证和新的实验数据对算法的性能进行了评估,在两个细菌群落中,预测的和观察到的丁酸盐产量之间的皮尔逊相关系数超过了0.75。虽然含有两种以上细菌的群落通常预测效果较差,但最佳的机器学习模型仍然优于仅基于基因组规模代谢模型的预测。总体而言,这种方法为大规模探索有潜力的丁酸盐产生群落、指导实验,更重要的是预测群落的代谢产物提供了一个有价值的工具和框架。