Suppr超能文献

Bakta:通过无比对序列鉴定实现细菌基因组的快速标准化注释。

Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification.

机构信息

Bioinformatics and Systems Biology, Justus Liebig University Giessen, Giessen 35392, Germany.

出版信息

Microb Genom. 2021 Nov;7(11). doi: 10.1099/mgen.0.000685.

Abstract

Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines heavily depend on taxon-specific databases or sufficiently well annotated reference genomes. Here, we introduce Bakta, a new command-line software tool for the robust, taxon-independent, thorough and, nonetheless, fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross-references. Annotation results are exported in GFF3 and International Nucleotide Sequence Database Collaboration (INSDC)-compliant flat files, as well as comprehensive JSON files, facilitating automated downstream analysis. We compared Bakta to other rapid contemporary command-line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references, whilst providing comparable wall-clock runtimes. Bakta is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a GPLv3 license at https://github.com/oschwengers/bakta. An accompanying web version is available at https://bakta.computational.bio.

摘要

与集中式在线服务相比,由于全球测序细菌基因组数量的增加,命令行注释软件工具的受欢迎程度不断提高。然而,现有命令行软件管道的结果严重依赖于特定于分类群的数据库或足够充分注释的参考基因组。在这里,我们介绍了 Bakta,这是一种新的命令行软件工具,用于对细菌基因组进行稳健、与分类群无关、全面且快速的注释。Bakta 执行全面的注释工作流程,包括考虑复制子元数据的小蛋白检测。通过无比对序列识别方法加速编码序列的注释,该方法还促进了公共数据库交叉引用的精确分配。注释结果以 GFF3 和国际核苷酸序列数据库协作 (INSDC) 兼容的平面文件以及综合 JSON 文件导出,便于自动进行下游分析。我们在针对和分类广泛的基准测试中(包括分离株和宏基因组组装基因组),将 Bakta 与其他快速的当代命令行注释软件工具进行了比较。我们证明,Bakta 在功能注释、功能类别和数据库交叉引用的分配方面优于其他工具,同时提供可比的运行时间。Bakta 是用 Python 3 编写的,可在 MacOS 和 Linux 系统上运行。它根据 GPLv3 许可证在 https://github.com/oschwengers/bakta 上免费提供。一个配套的网络版本可在 https://bakta.computational.bio 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8981/8743544/915b7e6961a7/mgen-7-0685-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验