Ganon2：最新且可扩展的宏基因组学分析。

ganon2: up-to-date and scalable metagenomics analysis.

作者信息

Piro Vitor C, Reinert Knut

机构信息

Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany.

出版信息

NAR Genom Bioinform. 2025 Jul 17;7(3):lqaf094. doi: 10.1093/nargab/lqaf094. eCollection 2025 Sep.

DOI:10.1093/nargab/lqaf094

PMID:40677913

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12267982/

Abstract

The fast growth of public genomic sequence repositories greatly contributes to the success of metagenomics. However, they are growing at a faster pace than the computational resources to use them. This challenges current methods, which struggle to take full advantage of massive and fast data generation. We propose a generational leap in performance and usability with ganon2, a sequence classification method that performs taxonomic binning and profiling for metagenomics analysis. It indexes large datasets with a small memory footprint, maintaining fast, sensitive, and precise classification results. Based on the full NCBI RefSeq and its subsets, ganon2 indices are on average 50% smaller than state-of-the-art methods. Using 16 simulated samples from various studies, including the CAMI 1+2 challenge, ganon2 achieved up to 0.15 higher median 1-score in taxonomic binning. In profiling, improvements in the 1-score median are up to 0.35, keeping a balanced L1-norm error in the abundance estimation. ganon2 is one of the fastest tools evaluated and enables the use of larger, more diverse, and up-to-date reference sets in daily microbiome analysis, improving the resolution of results. The code is open-source and available with documentation at https://github.com/pirovc/ganon.

摘要

公共基因组序列库的快速增长极大地推动了宏基因组学的成功。然而，它们的增长速度超过了用于分析这些数据的计算资源。这对当前的方法提出了挑战，这些方法难以充分利用海量且快速生成的数据。我们提出了ganon2，这是一种用于宏基因组学分析的序列分类方法，在性能和可用性方面实现了代际飞跃。它以较小的内存占用对大型数据集进行索引，同时保持快速、灵敏且精确的分类结果。基于完整的NCBI RefSeq及其子集，ganon2索引平均比现有技术方法小50%。使用来自各种研究的16个模拟样本，包括CAMI 1+2挑战赛样本，ganon2在分类分箱中实现了高达0.15的更高中位数1分数。在特征分析中，1分数中位数的提升高达0.35，同时在丰度估计中保持平衡的L1范数误差。ganon2是评估的最快工具之一，能够在日常微生物组分析中使用更大、更多样化和更新的参考集，提高结果的分辨率。该代码是开源的，可在https://github.com/pirovc/ganon上获取并附有文档。