Sarhan Mohamed S, Filosi Michele, Maixner Frank, Fuchsberger Christian
Institute for Biomedicine, Eurac Research, Bolzano 39100, Italy (Affiliated institute with Lübeck University, Lübeck, Germany).
Department CIBIO, University of Trento, Trento 38123, Italy.
bioRxiv. 2024 Mar 27:2024.03.22.586347. doi: 10.1101/2024.03.22.586347.
Analyzing taxonomic diversity and identification in diverse ecological samples has become a crucial routine in various research and industrial fields. While DNA barcoding marker-gene approaches were once prevalent, the decreasing costs of next-generation sequencing have made metagenomic shotgun sequencing more popular and feasible. In contrast to DNA-barcoding, metagenomic shotgun sequencing offers possibilities for in-depth characterization of structural and functional diversity. However, analysis of such data is still considered a hurdle due to absence of taxa-specific databases. Here we present taxonize-gb, a command-line software tool to extract GenBank non-redundant nucleotide and protein databases, related to one or more input taxonomy identifier. Our tool allows the creation of taxa-specific reference databases tailored to specific research questions, which reduces search times and therefore represents a practical solution for researchers analyzing large metagenomic data on regular basis. Taxonize-gb is an open-source command-line Python-based tool freely available for installation at https://pypi.org/project/taxonize-gb/ and on GitHub https://github.com/msabrysarhan/taxonize_genbank. It is released under Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
分析不同生态样本中的分类多样性和鉴定已成为各个研究和工业领域的关键常规操作。虽然DNA条形码标记基因方法曾经很普遍,但新一代测序成本的降低使得宏基因组鸟枪法测序更受欢迎且可行。与DNA条形码不同,宏基因组鸟枪法测序为深入表征结构和功能多样性提供了可能性。然而,由于缺乏特定分类群的数据库,此类数据分析仍被视为一个障碍。在这里,我们展示了taxonize-gb,这是一个命令行软件工具,用于提取与一个或多个输入分类标识符相关的GenBank非冗余核苷酸和蛋白质数据库。我们的工具允许创建针对特定研究问题量身定制的特定分类群参考数据库,这减少了搜索时间,因此为定期分析大型宏基因组数据的研究人员提供了一个切实可行的解决方案。Taxonize-gb是一个基于Python的开源命令行工具,可在https://pypi.org/project/taxonize-gb/ 以及GitHub上的https://github.com/msabrysarhan/taxonize_genbank免费安装。它根据知识共享署名-非商业性使用4.0国际许可协议(CC BY-NC 4.0)发布。