Chakoory Oshma, Comtet-Marre Sophie, Peyret Pierre
Université Clermont Auvergne, INRAE, MEDIS, F-63000 Clermont-Ferrand, France.
NAR Genom Bioinform. 2022 Sep 21;4(3):lqac070. doi: 10.1093/nargab/lqac070. eCollection 2022 Sep.
Metagenomic classifiers are widely used for the taxonomic profiling of metagenomics data and estimation of taxa relative abundance. Small subunit rRNA genes are a gold standard for phylogenetic resolution of microbiota, although the power of this marker comes down to its use as full-length. We aimed at identifying the tools that can efficiently lead to taxonomic resolution down to the species level. To reach this goal, we benchmarked the performance and accuracy of rRNA-specialized versus general-purpose read mappers, reference-targeted assemblers and taxonomic classifiers. We then compiled the best tools (BBTools, FastQC, SortMeRNA, MetaRib, EMIRGE, VSEARCH, BBMap and QIIME 2's Sklearn classifier) to build a pipeline called RiboTaxa. Using metagenomics datasets, RiboTaxa gave the best results compared to other tools (i.e. Kraken2, Centrifuge, METAXA2, phyloFlash, SPINGO, BLCA, MEGAN) with precise taxonomic identification and relative abundance description without false positive detection (-measure of 100% and 83.7% at genus level and species level, respectively). Using real datasets from various environments (i.e. ocean, soil, human gut) and from different approaches (e.g. metagenomics and gene capture by hybridization), RiboTaxa revealed microbial novelties not discerned by current bioinformatics analysis opening new biological perspectives in human and environmental health.
宏基因组分类器广泛用于宏基因组数据的分类分析和分类单元相对丰度的估计。小亚基rRNA基因是微生物群落系统发育解析的金标准,不过该标记物的作用取决于其全长形式的使用。我们旨在鉴定能够有效实现物种水平分类解析的工具。为实现这一目标,我们对专门针对rRNA的读段比对工具、通用读段比对工具、参考靶向组装工具和分类分类器的性能与准确性进行了基准测试。然后,我们汇总了最佳工具(BBTools、FastQC、SortMeRNA、MetaRib、EMIRGE、VSEARCH、BBMap和QIIME 2的Sklearn分类器)来构建一个名为RiboTaxa的流程。与其他工具(即Kraken2、Centrifuge、METAXA2、phyloFlash、SPINGO、BLCA、MEGAN)相比,使用宏基因组数据集时,RiboTaxa给出了最佳结果,具有精确的分类鉴定和相对丰度描述,且无假阳性检测(在属水平和种水平的检测率分别为100%和83.7%)。使用来自各种环境(即海洋、土壤、人类肠道)以及不同方法(例如宏基因组学和杂交基因捕获)的真实数据集时,RiboTaxa揭示了当前生物信息学分析未识别的微生物新物种,为人类和环境健康开辟了新的生物学视角。