Herazo-Álvarez Jair, Mora Marco, Cuadros-Orellana Sara, Vilches-Ponce Karina, Hernández-García Ruber
Doctorado en Modelamiento Matemático Aplicado, Universidad Católica del Maule, Talca, Maule 3480564, Chile.
Laboratory of Technological Research in Pattern Recognition (LITRP), Universidad Católica del Maule, Talca, Maule 3480564, Chile.
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf065.
One of the main goals of metagenomic studies is to describe the taxonomic diversity of microbial communities. A crucial step in metagenomic analysis is metagenomic binning, which involves the (supervised) classification or (unsupervised) clustering of metagenomic sequences. Various machine learning models have been applied to address this task. In this review, the contributions of artificial neural networks (ANN) in the context of metagenomic binning are detailed, addressing both supervised, unsupervised, and semi-supervised approaches. 34 ANN-based binning tools are systematically compared, detailing their architectures, input features, datasets, advantages, disadvantages, and other relevant aspects. The findings reveal that deep learning approaches, such as convolutional neural networks and autoencoders, achieve higher accuracy and scalability than traditional methods. Gaps in benchmarking practices are highlighted, and future directions are proposed, including standardized datasets and optimization of architectures, for third-generation sequencing. This review provides support to researchers in identifying trends and selecting suitable tools for the metagenomic binning problem.
宏基因组学研究的主要目标之一是描述微生物群落的分类多样性。宏基因组分析中的一个关键步骤是宏基因组分箱,它涉及宏基因组序列的(有监督)分类或(无监督)聚类。各种机器学习模型已被应用于解决这一任务。在本综述中,详细阐述了人工神经网络(ANN)在宏基因组分箱背景下的贡献,涵盖了有监督、无监督和半监督方法。系统比较了34种基于ANN的分箱工具,详细说明了它们的架构、输入特征、数据集、优点、缺点和其他相关方面。研究结果表明,深度学习方法,如卷积神经网络和自动编码器,比传统方法具有更高的准确性和可扩展性。强调了基准测试实践中的差距,并提出了未来的方向,包括标准化数据集和针对第三代测序的架构优化。本综述为研究人员识别趋势和为宏基因组分箱问题选择合适的工具提供了支持。