Mathieu Alban, Leclercq Mickael, Sanabria Melissa, Perin Olivier, Droit Arnaud
Computational Biology Laboratory, CHU de Québec - Université Laval Research Centre, Québec City, QC, Canada.
Université Côte d'Azur, CNRS, INRIA, I3S, Nice, France.
Front Microbiol. 2022 Mar 14;13:811495. doi: 10.3389/fmicb.2022.811495. eCollection 2022.
Shotgun sequencing of environmental DNA (i.e., metagenomics) has revolutionized the field of environmental microbiology, allowing the characterization of all microorganisms in a sequencing experiment. To identify the microbes in terms of taxonomy and biological activity, the sequenced reads must necessarily be aligned on known microbial genomes/genes. However, current alignment methods are limited in terms of speed and can produce a significant number of false positives when detecting bacterial species or false negatives in specific cases (virus, plasmids, and gene detection). Moreover, recent advances in metagenomics have enabled the reconstruction of new genomes using binning strategies, but these genomes, not yet fully characterized, are not used in classic approaches, whereas machine and deep learning methods can use them as models. In this article, we attempted to review the different methods and their efficiency to improve the annotation of metagenomic sequences. Deep learning models have reached the performance of the widely used k-mer alignment-based tools, with better accuracy in certain cases; however, they still must demonstrate their robustness across the variety of environmental samples and across the rapid expansion of accessible genomes in databases.
对环境DNA进行鸟枪法测序(即宏基因组学)彻底改变了环境微生物学领域,使得在一次测序实验中就能对所有微生物进行表征。为了从分类学和生物活性方面鉴定微生物,测序读段必须与已知的微生物基因组/基因进行比对。然而,当前的比对方法在速度方面存在局限,并且在检测细菌物种时可能会产生大量假阳性结果,或者在特定情况下(病毒、质粒和基因检测)出现假阴性结果。此外,宏基因组学的最新进展使得利用分箱策略重建新基因组成为可能,但这些尚未完全表征的基因组并未用于传统方法,而机器学习和深度学习方法可以将它们用作模型。在本文中,我们试图综述不同的方法及其在改进宏基因组序列注释方面的效率。深度学习模型已经达到了广泛使用的基于k-mer比对工具的性能,在某些情况下具有更高的准确性;然而,它们仍必须在各种环境样本以及数据库中可获取基因组的快速扩充中证明其稳健性。