Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, QLD 4072, Australia.
Research Computing Center, The University of Queensland, St Lucia, QLD 4072, Australia.
Bioinformatics. 2022 Nov 30;38(23):5315-5316. doi: 10.1093/bioinformatics/btac672.
The Genome Taxonomy Database (GTDB) and associated taxonomic classification toolkit (GTDB-Tk) have been widely adopted by the microbiology community. However, the growing size of the GTDB bacterial reference tree has resulted in GTDB-Tk requiring substantial amounts of memory (∼320 GB) which limits its adoption and ease of use. Here, we present an update to GTDB-Tk that uses a divide-and-conquer approach where user genomes are initially placed into a bacterial reference tree with family-level representatives followed by placement into an appropriate class-level subtree comprising species representatives. This substantially reduces the memory requirements of GTDB-Tk while having minimal impact on classification.
GTDB-Tk is implemented in Python and licenced under the GNU General Public Licence v3.0. Source code and documentation are available at: https://github.com/ecogenomics/gtdbtk.
Supplementary data are available at Bioinformatics online.
基因组分类数据库(GTDB)及其相关的分类分类工具包(GTDB-Tk)已被微生物学领域广泛采用。然而,GTDB 细菌参考树的不断增长导致 GTDB-Tk 需要大量的内存(约 320GB),这限制了它的采用和易用性。在这里,我们介绍了 GTDB-Tk 的更新,它使用了一种分而治之的方法,其中用户基因组最初被放置在具有家族代表的细菌参考树中,然后被放置在包含物种代表的适当的类代表子树中。这大大降低了 GTDB-Tk 的内存需求,同时对分类的影响最小。
GTDB-Tk 是用 Python 实现的,并根据 GNU 通用公共许可证 v3.0 获得许可。源代码和文档可在:https://github.com/ecogenomics/gtdbtk。
补充数据可在生物信息学在线获得。