Bizzotto Edoardo, Fraulini Sofia, Zampieri Guido, Orellana Esteban, Treu Laura, Campanaro Stefano
Department of Biology, University of Padova, Padova, 35131, Italy.
Environ Microbiome. 2024 Aug 8;19(1):58. doi: 10.1186/s40793-024-00600-6.
In recent years, there has been a rapid increase in the number of microbial genomes reconstructed through shotgun sequencing, and obtained by newly developed approaches including metagenomic binning and single-cell sequencing. However, our ability to functionally characterize these genomes by experimental assays is orders of magnitude less efficient. Consequently, there is a pressing need for the development of swift and automated strategies for the functional classification of microbial genomes.
The present work leverages a suite of supervised machine learning algorithms to establish a range of 86 metabolic and other ecological functions, such as methanotrophy and plastic degradation, starting from widely obtainable microbial genome annotations. Tests performed on independent datasets demonstrated robust performance across complete, fragmented, and incomplete genomes above a 70% completeness level for most of the considered functions. Application of the algorithms to the Biogas Microbiome database yielded predictions broadly consistent with current biological knowledge and correctly detecting functionally-related nuances of archaeal genomes. Finally, a case study focused on acetoclastic methanogenesis demonstrated how the developed machine learning models can be refined or expanded with models describing novel functions of interest.
The resulting tool, MICROPHERRET, incorporates a total of 86 models, one for each tested functional class, and can be applied to high-quality microbial genomes as well as to low-quality genomes derived from metagenomics and single-cell sequencing. MICROPHERRET can thus aid in understanding the functional role of newly generated genomes within their micro-ecological context.
近年来,通过鸟枪法测序重建的微生物基因组数量迅速增加,这些基因组是通过包括宏基因组分箱和单细胞测序在内的新开发方法获得的。然而,我们通过实验分析对这些基因组进行功能表征的能力效率要低几个数量级。因此,迫切需要开发快速且自动化的策略来对微生物基因组进行功能分类。
目前的工作利用了一套监督机器学习算法,从广泛可获得的微生物基因组注释开始,建立了一系列86种代谢和其他生态功能,如甲烷营养和塑料降解。在独立数据集上进行的测试表明,对于大多数考虑的功能,在完整性水平高于70%的完整、片段化和不完整基因组中,该算法都具有强大的性能。将这些算法应用于沼气微生物组数据库,得到的预测结果与当前生物学知识大致一致,并正确检测出古菌基因组功能相关的细微差别。最后,一个专注于乙酸裂解产甲烷作用的案例研究展示了如何用描述感兴趣的新功能的模型来完善或扩展所开发的机器学习模型。
由此产生的工具MICROPHERRET总共包含86个模型,每个测试的功能类别一个,可应用于高质量的微生物基因组以及源自宏基因组学和单细胞测序的低质量基因组。因此,MICROPHERRET有助于在微生态背景下理解新生成基因组的功能作用。