Piro Vitor C, Reinert Knut
Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany.
NAR Genom Bioinform. 2025 Jul 17;7(3):lqaf094. doi: 10.1093/nargab/lqaf094. eCollection 2025 Sep.
The fast growth of public genomic sequence repositories greatly contributes to the success of metagenomics. However, they are growing at a faster pace than the computational resources to use them. This challenges current methods, which struggle to take full advantage of massive and fast data generation. We propose a generational leap in performance and usability with ganon2, a sequence classification method that performs taxonomic binning and profiling for metagenomics analysis. It indexes large datasets with a small memory footprint, maintaining fast, sensitive, and precise classification results. Based on the full NCBI RefSeq and its subsets, ganon2 indices are on average 50% smaller than state-of-the-art methods. Using 16 simulated samples from various studies, including the CAMI 1+2 challenge, ganon2 achieved up to 0.15 higher median 1-score in taxonomic binning. In profiling, improvements in the 1-score median are up to 0.35, keeping a balanced L1-norm error in the abundance estimation. ganon2 is one of the fastest tools evaluated and enables the use of larger, more diverse, and up-to-date reference sets in daily microbiome analysis, improving the resolution of results. The code is open-source and available with documentation at https://github.com/pirovc/ganon.
公共基因组序列库的快速增长极大地推动了宏基因组学的成功。然而,它们的增长速度超过了用于分析这些数据的计算资源。这对当前的方法提出了挑战,这些方法难以充分利用海量且快速生成的数据。我们提出了ganon2,这是一种用于宏基因组学分析的序列分类方法,在性能和可用性方面实现了代际飞跃。它以较小的内存占用对大型数据集进行索引,同时保持快速、灵敏且精确的分类结果。基于完整的NCBI RefSeq及其子集,ganon2索引平均比现有技术方法小50%。使用来自各种研究的16个模拟样本,包括CAMI 1+2挑战赛样本,ganon2在分类分箱中实现了高达0.15的更高中位数1分数。在特征分析中,1分数中位数的提升高达0.35,同时在丰度估计中保持平衡的L1范数误差。ganon2是评估的最快工具之一,能够在日常微生物组分析中使用更大、更多样化和更新的参考集,提高结果的分辨率。该代码是开源的,可在https://github.com/pirovc/ganon上获取并附有文档。