Suppr超能文献

DFAST_QC:用于原核生物基因组的质量评估和分类鉴定工具。

DFAST_QC: quality assessment and taxonomic identification tool for prokaryotic Genomes.

作者信息

Elmanzalawi Mohamed, Fujisawa Takatomo, Mori Hiroshi, Nakamura Yasukazu, Tanizawa Yasuhiro

机构信息

Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI), Mishima, 411-8540, Japan.

Department of Informatics, National Institute of Genetics, Mishima, 411-8540, Japan.

出版信息

BMC Bioinformatics. 2025 Jan 7;26(1):3. doi: 10.1186/s12859-024-06030-y.

Abstract

BACKGROUND

Accurate taxonomic classification in genome databases is essential for reliable biological research and effective data sharing. Mislabeling or inaccuracies in genome annotations can lead to incorrect scientific conclusions and hinder the reproducibility of research findings. Despite advances in genome analysis techniques, challenges persist in ensuring precise and reliable taxonomic assignments. Existing tools for genome verification often involve extensive computational resources or lengthy processing times, which can limit their accessibility and scalability for large-scale projects. There is a need for more efficient, user-friendly solutions that can handle diverse datasets and provide accurate results with minimal computational demands. This work aimed to address these challenges by introducing a novel tool that enhances taxonomic accuracy, offers a user-friendly interface, and supports large-scale analyses.

RESULTS

We introduce a novel tool for the quality control and taxonomic classification tool of prokaryotic genomes, called DFAST_QC, which is available as both a command-line tool and a web service. DFAST_QC can quickly identify species based on NCBI and GTDB taxonomies by combining genome-distance calculations using MASH with ANI calculations using Skani. We evaluated DFAST_QC's performance in species identification and found it to be highly consistent with existing taxonomic standards, successfully identifying species across diverse datasets. In several cases, DFAST_QC identified potential mislabeling of species names in public databases and highlighted discrepancies in current classifications, demonstrating its capability to uncover errors and enhance taxonomic accuracy. Additionally, the tool's efficient design allows it to operate smoothly on local machines with minimal computational requirements, making it a practical choice for large-scale genome projects.

CONCLUSIONS

DFAST_QC is a reliable and efficient tool for accurate taxonomic identification and genome quality control, well-suited for large-scale genomic studies. Its compatibility with limited-resource environments, combined with its user-friendly design, ensures seamless integration into existing workflows. DFAST_QC's ability to refine species assignments in public databases highlights its value as a complementary tool for maintaining and enhancing the accuracy of taxonomic data in genomic research. The web version is available at https://dfast.ddbj.nig.ac.jp/dqc/submit/ , and the source code for local use can be found at https://github.com/nigyta/dfast_qc .

摘要

背景

基因组数据库中的准确分类对于可靠的生物学研究和有效的数据共享至关重要。基因组注释中的错误标记或不准确可能导致错误的科学结论,并阻碍研究结果的可重复性。尽管基因组分析技术取得了进展,但在确保精确和可靠的分类分配方面仍然存在挑战。现有的基因组验证工具通常需要大量的计算资源或较长的处理时间,这可能会限制它们在大规模项目中的可及性和可扩展性。需要更高效、用户友好的解决方案,能够处理各种数据集并以最少的计算需求提供准确的结果。这项工作旨在通过引入一种新工具来应对这些挑战,该工具可提高分类准确性、提供用户友好的界面并支持大规模分析。

结果

我们引入了一种用于原核生物基因组质量控制和分类的新工具,称为DFAST_QC,它既可以作为命令行工具使用,也可以作为网络服务使用。DFAST_QC可以通过将使用MASH进行的基因组距离计算与使用Skani进行的ANI计算相结合,基于NCBI和GTDB分类法快速识别物种。我们评估了DFAST_QC在物种识别方面的性能,发现它与现有的分类标准高度一致,成功地在各种数据集中识别了物种。在几个案例中,DFAST_QC识别出了公共数据库中物种名称的潜在错误标记,并突出了当前分类中的差异,证明了其发现错误和提高分类准确性的能力。此外,该工具的高效设计使其能够在本地机器上以最少的计算需求平稳运行,使其成为大规模基因组项目的实际选择。

结论

DFAST_QC是一种用于准确分类识别和基因组质量控制的可靠且高效的工具,非常适合大规模基因组研究。它与资源有限的环境的兼容性,加上其用户友好的设计,确保了无缝集成到现有工作流程中。DFAST_QC在公共数据库中细化物种分配的能力突出了其作为维护和提高基因组研究中分类数据准确性的补充工具的价值。网络版本可在https://dfast.ddbj.nig.ac.jp/dqc/submit/ 获得,本地使用的源代码可在https://github.com/nigyta/dfast_qc 找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df6d/11705978/f1123d6cf49f/12859_2024_6030_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验