• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

BLAST-QC:BLAST结果的自动化分析

BLAST-QC: automated analysis of BLAST results.

作者信息

Torkian Behzad, Hann Spencer, Preisner Eva, Norman R Sean

机构信息

Department of Environmental Health Sciences, University of South Carolina, 921 Assembly Street, Columbia, SC, 29208, USA.

出版信息

Environ Microbiome. 2020 Aug 12;15(1):15. doi: 10.1186/s40793-020-00361-y.

DOI:10.1186/s40793-020-00361-y
PMID:33902722
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8066848/
Abstract

BACKGROUND

The Basic Local Alignment Search Tool (BLAST) from NCBI is the preferred utility for sequence alignment and identification for bioinformatics and genomics research. Among researchers using NCBI's BLAST software, it is well known that analyzing the results of a large BLAST search can be tedious and time-consuming. Furthermore, with the recent discussions over the effects of parameters such as '-max_target_seqs' on the BLAST heuristic search process, the use of these search options are questionable. This leaves using a stand-alone parser as one of the only options of condensing these large datasets, and with few available for download online, the task is left to the researcher to create a specialized piece of software anytime they need to analyze BLAST results. The need for a streamlined and fast script that solves these issues and can be easily implemented into a variety of bioinformatics and genomics workflows was the initial motivation for developing this software.

RESULTS

In this study, we demonstrate the effectiveness of BLAST-QC for analysis of BLAST results and its desirability over the other available options. Applying genetic sequence data from our bioinformatic workflows, we establish BLAST_QC's superior runtime when compared to existing parsers developed with commonly used BioPerl and BioPython modules, as well as C and Java implementations of the BLAST_QC program. We discuss the 'max_target_seqs' parameter, the usage of and controversy around the use of the parameter, and offer a solution by demonstrating the ability of our software to provide the functionality this parameter was assumed to produce, as well as a variety of other parsing options. Executions of the script on example datasets are given, demonstrating the implemented functionality and providing test-cases of the program. BLAST-QC is designed to be integrated into existing software, and we establish its effectiveness as a module of workflows or other processes.

CONCLUSIONS

BLAST-QC provides the community with a simple, lightweight and portable Python script that allows for easy quality control of BLAST results while avoiding the drawbacks of other options. This includes the uncertain results of applying the -max_target_seqs parameter or relying on the cumbersome dependencies of other options like BioPerl, Java, etc. which add complexity and run time when running large data sets of sequences. BLAST-QC is ideal for use in high-throughput workflows and pipelines common in bioinformatic and genomic research, and the script has been designed for portability and easy integration into whatever type of processes the user may be running.

摘要

背景

美国国立医学图书馆(NCBI)的基本局部比对搜索工具(BLAST)是生物信息学和基因组学研究中序列比对和识别的首选工具。在使用NCBI的BLAST软件的研究人员中,众所周知,分析大型BLAST搜索的结果可能既繁琐又耗时。此外,随着最近关于诸如“-max_target_seqs”等参数对BLAST启发式搜索过程的影响的讨论,这些搜索选项的使用存在疑问。这使得使用独立的解析器成为压缩这些大型数据集的少数选择之一,而且在线可供下载的解析器很少,因此研究人员在需要分析BLAST结果时不得不自行创建专门的软件。开发此软件的最初动机是需要一个简化且快速的脚本,以解决这些问题并能轻松集成到各种生物信息学和基因组学工作流程中。

结果

在本研究中,我们证明了BLAST-QC在分析BLAST结果方面的有效性及其相对于其他可用选项的优势。应用我们生物信息学工作流程中的基因序列数据,我们确定了BLAST_QC与使用常用的BioPerl和BioPython模块以及BLAST_QC程序的C和Java实现开发的现有解析器相比,具有更优越的运行时性能。我们讨论了“max_target_seqs”参数、该参数的使用情况及其使用争议,并通过展示我们的软件能够提供该参数假定产生的功能以及各种其他解析选项,提供了一个解决方案。给出了脚本在示例数据集上的执行情况,展示了所实现的功能并提供了程序的测试用例。BLAST-QC旨在集成到现有软件中,我们确定了它作为工作流程或其他过程的一个模块的有效性。

结论

BLAST-QC为社区提供了一个简单、轻量级且可移植的Python脚本,可轻松对BLAST结果进行质量控制,同时避免了其他选项的缺点。这包括应用“-max_target_seqs”参数的不确定结果,或依赖于BioPerl、Java等其他选项的繁琐依赖项,这些在运行大型序列数据集时会增加复杂性和运行时间。BLAST-QC非常适合用于生物信息学和基因组学研究中常见的高通量工作流程和管道,并且该脚本设计为具有可移植性,易于集成到用户可能运行的任何类型的过程中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a514/8066848/c579075a101b/40793_2020_361_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a514/8066848/a6f398acb8da/40793_2020_361_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a514/8066848/beeebfc2813e/40793_2020_361_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a514/8066848/3de327aaa8a4/40793_2020_361_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a514/8066848/7fac385740b3/40793_2020_361_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a514/8066848/23ae3ea4aa0c/40793_2020_361_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a514/8066848/c579075a101b/40793_2020_361_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a514/8066848/a6f398acb8da/40793_2020_361_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a514/8066848/beeebfc2813e/40793_2020_361_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a514/8066848/3de327aaa8a4/40793_2020_361_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a514/8066848/7fac385740b3/40793_2020_361_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a514/8066848/23ae3ea4aa0c/40793_2020_361_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a514/8066848/c579075a101b/40793_2020_361_Fig6_HTML.jpg

相似文献

1
BLAST-QC: automated analysis of BLAST results.BLAST-QC:BLAST结果的自动化分析
Environ Microbiome. 2020 Aug 12;15(1):15. doi: 10.1186/s40793-020-00361-y.
2
Portable BLAST-like algorithm library and its implementations for command line, Python, and R.可移植的 BLAST 样算法库及其在命令行、Python 和 R 中的实现。
PLoS One. 2023 Nov 30;18(11):e0289693. doi: 10.1371/journal.pone.0289693. eCollection 2023.
3
BpWrapper: BioPerl-based sequence and tree utilities for rapid prototyping of bioinformatics pipelines.BpWrapper:基于 BioPerl 的序列和树实用程序,用于快速原型化生物信息学管道。
BMC Bioinformatics. 2018 Mar 2;19(1):76. doi: 10.1186/s12859-018-2074-9.
4
Windows .NET Network Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST).Windows .NET网络分布式基本局部比对搜索工具包(W.ND-BLAST)。
BMC Bioinformatics. 2005 Apr 8;6:93. doi: 10.1186/1471-2105-6-93.
5
G-BLASTN: accelerating nucleotide alignment by graphics processors.G-BLASTN:通过图形处理器加速核苷酸比对。
Bioinformatics. 2014 May 15;30(10):1384-91. doi: 10.1093/bioinformatics/btu047. Epub 2014 Jan 24.
6
FACEPAI: a script for fast and consistent environmental DNA processing and identification.FACEPAI:一种用于快速且一致的环境 DNA 处理和鉴定的脚本。
BMC Ecol. 2019 Dec 6;19(1):51. doi: 10.1186/s12898-019-0269-1.
7
Massively Parallel Implementation of Sequence Alignment with Basic Local Alignment Search Tool Using Parallel Computing in Java Library.使用Java库中的并行计算通过基本局部比对搜索工具进行序列比对的大规模并行实现。
J Comput Biol. 2018 Aug;25(8):871-881. doi: 10.1089/cmb.2018.0079. Epub 2018 Jul 13.
8
NOBLAST and JAMBLAST: New Options for BLAST and a Java Application Manager for BLAST results.NOBLAST和JAMBLAST:BLAST的新选项以及用于BLAST结果的Java应用程序管理器。
Bioinformatics. 2009 Mar 15;25(6):824-6. doi: 10.1093/bioinformatics/btp067. Epub 2009 Jan 29.
9
Watchdog - a workflow management system for the distributed analysis of large-scale experimental data.Watchdog - 一种用于大规模实验数据分析的分布式工作流管理系统。
BMC Bioinformatics. 2018 Mar 13;19(1):97. doi: 10.1186/s12859-018-2107-4.
10
BLAST+: architecture and applications.BLAST+:体系结构与应用。
BMC Bioinformatics. 2009 Dec 15;10:421. doi: 10.1186/1471-2105-10-421.

引用本文的文献

1
ProtAlign-ARG: antibiotic resistance gene characterization integrating protein language models and alignment-based scoring.ProtAlign-ARG:整合蛋白质语言模型和基于比对的评分的抗生素抗性基因表征
Sci Rep. 2025 Aug 18;15(1):30174. doi: 10.1038/s41598-025-14545-4.
2
Sequence comparison of the mitochondrial genomes of five caridean shrimps of the infraorder Caridea: phylogenetic implications and divergence time estimation.五种十足目对虾亚目十足目虾的线粒体基因组序列比较:系统发育意义和分歧时间估计。
BMC Genomics. 2024 Oct 16;25(1):968. doi: 10.1186/s12864-024-10775-4.
3
Museum Skins Enable Identification of Introgression Associated with Cytonuclear Discordance.

本文引用的文献

1
Reply to the paper: Misunderstood parameters of NCBI BLAST impacts the correctness of bioinformatics workflows.对论文《NCBI BLAST中被误解的参数影响生物信息学工作流程的正确性》的回复
Bioinformatics. 2019 Aug 1;35(15):2699-2700. doi: 10.1093/bioinformatics/bty1026.
2
Commonly misunderstood parameters of NCBI BLAST and important considerations for users.美国国立生物技术信息中心(NCBI)基本局部比对搜索工具(BLAST)中常见的误解参数及用户的重要注意事项。
Bioinformatics. 2019 Aug 1;35(15):2697-2698. doi: 10.1093/bioinformatics/bty1018.
3
Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows.
博物馆标本可用于鉴定与核质不符相关的渐渗。
Syst Biol. 2024 Sep 5;73(3):579-593. doi: 10.1093/sysbio/syae016.
4
Potential for the anaerobic oxidation of benzene and naphthalene in thermophilic microorganisms from the Guaymas Basin.瓜伊马斯盆地嗜热微生物中苯和萘厌氧氧化的潜力。
Front Microbiol. 2023 Sep 29;14:1279865. doi: 10.3389/fmicb.2023.1279865. eCollection 2023.
5
Improved global protein homolog detection with major gains in function identification.提高全局蛋白质同源物检测的功能识别能力。
Proc Natl Acad Sci U S A. 2023 Feb 28;120(9):e2211823120. doi: 10.1073/pnas.2211823120. Epub 2023 Feb 24.
美国国立医学图书馆生物信息学数据库(NCBI BLAST)中被误解的参数影响生物信息学工作流程的正确性。
Bioinformatics. 2019 May 1;35(9):1613-1614. doi: 10.1093/bioinformatics/bty833.
4
Comparative analysis of targeted long read sequencing approaches for characterization of a plant's immune receptor repertoire.用于植物免疫受体库表征的靶向长读测序方法的比较分析
BMC Genomics. 2017 Jul 26;18(1):564. doi: 10.1186/s12864-017-3936-7.
5
Microbial Mat Compositional and Functional Sensitivity to Environmental Disturbance.微生物席对环境干扰的组成和功能敏感性。
Front Microbiol. 2016 Oct 17;7:1632. doi: 10.3389/fmicb.2016.01632. eCollection 2016.
6
Low coverage sequencing of three echinoderm genomes: the brittle star Ophionereis fasciata, the sea star Patiriella regularis, and the sea cucumber Australostichopus mollis.三种棘皮动物基因组的低覆盖度测序:脆星(Ophionereis fasciata)、海星(Patiriella regularis)和海参(Australostichopus mollis)。
Gigascience. 2016 May 10;5:20. doi: 10.1186/s13742-016-0125-6. eCollection 2016.
7
Skin Microbiome Surveys Are Strongly Influenced by Experimental Design.皮肤微生物组调查受到实验设计的强烈影响。
J Invest Dermatol. 2016 May;136(5):947-956. doi: 10.1016/j.jid.2016.01.016. Epub 2016 Jan 29.