Biology Department, Woods Hole Oceanographic Institution, Woods Hole, United States.
Luit Consulting, Revere, United States.
Elife. 2024 May 2;13:e85749. doi: 10.7554/eLife.85749.
The reconstruction of complete microbial metabolic pathways using 'omics data from environmental samples remains challenging. Computational pipelines for pathway reconstruction that utilize machine learning methods to predict the presence or absence of KEGG modules in incomplete genomes are lacking. Here, we present MetaPathPredict, a software tool that incorporates machine learning models to predict the presence of complete KEGG modules within bacterial genomic datasets. Using gene annotation data and information from the KEGG module database, MetaPathPredict employs deep learning models to predict the presence of KEGG modules in a genome. MetaPathPredict can be used as a command line tool or as a Python module, and both options are designed to be run locally or on a compute cluster. Benchmarks show that MetaPathPredict makes robust predictions of KEGG module presence within highly incomplete genomes.
使用来自环境样本的‘组学’数据重建完整的微生物代谢途径仍然具有挑战性。利用机器学习方法来预测不完全基因组中 KEGG 模块的存在或不存在的途径重建计算管道是缺乏的。在这里,我们提出了 MetaPathPredict,这是一个软件工具,它结合了机器学习模型来预测细菌基因组数据集中完整的 KEGG 模块的存在。使用基因注释数据和 KEGG 模块数据库的信息,MetaPathPredict 使用深度学习模型来预测基因组中 KEGG 模块的存在。MetaPathPredict 可以作为命令行工具或 Python 模块使用,这两种选择都旨在在本地或计算群集上运行。基准测试表明,MetaPathPredict 可以对高度不完整的基因组中 KEGG 模块的存在做出稳健的预测。