Levy Karin Eli, Steinegger Martin
ELKMO, Copenhagen 2720, Denmark.
School of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea.
Natl Sci Rev. 2025 Feb 19;12(6):nwaf056. doi: 10.1093/nsr/nwaf056. eCollection 2025 Jun.
Recent years have seen incredible progress in the development of deep-learning (DL) tools for the analysis of biological data, with the most prominent example being AlphaFold2 for accurate protein structure prediction. DL-based tools are especially useful for identifying patterns and connections within sparsely labeled datasets. This makes them essential for the analysis of metagenomic data, which is mostly unannotated and bears little sequence similarity to known genes and proteins. In this review, we chose to present 12 tools which we deem as offering novel capabilities for metagenomic analysis by utilizing interesting DL techniques. This review is thus intended to be a solid starting point for any data scientist looking to apply advanced methods to explore metagenomic datasets. For each DL-based tool, we present its computational principles, followed by relevant examples of its application where possible and a note on its limitations.
近年来,用于分析生物数据的深度学习(DL)工具取得了惊人的进展,最突出的例子是用于精确蛋白质结构预测的AlphaFold2。基于DL的工具对于识别稀疏标记数据集中的模式和联系特别有用。这使得它们对于宏基因组数据分析至关重要,因为宏基因组数据大多未注释,且与已知基因和蛋白质的序列相似性很小。在本综述中,我们选择介绍12种工具,我们认为这些工具通过利用有趣的DL技术为宏基因组分析提供了新的能力。因此,本综述旨在为任何希望应用先进方法探索宏基因组数据集的数据科学家提供一个坚实的起点。对于每个基于DL的工具,我们介绍其计算原理,随后尽可能给出相关应用示例,并说明其局限性。