• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

iBLAST:通过自动 e 值校正对新序列进行增量 BLAST。

iBLAST: Incremental BLAST of new sequences via automated e-value correction.

机构信息

National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, United States of America.

Department of Computer Science, Virginia Tech, Blacksburg, VA, United States of America.

出版信息

PLoS One. 2021 Apr 22;16(4):e0249410. doi: 10.1371/journal.pone.0249410. eCollection 2021.

DOI:10.1371/journal.pone.0249410
PMID:33886589
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8062096/
Abstract

Search results from local alignment search tools use statistical scores that are sensitive to the size of the database to report the quality of the result. For example, NCBI BLAST reports the best matches using similarity scores and expect values (i.e., e-values) calculated against the database size. Given the astronomical growth in genomics data throughout a genomic research investigation, sequence databases grow as new sequences are continuously being added to these databases. As a consequence, the results (e.g., best hits) and associated statistics (e.g., e-values) for a specific set of queries may change over the course of a genomic investigation. Thus, to update the results of a previously conducted BLAST search to find the best matches on an updated database, scientists must currently rerun the BLAST search against the entire updated database, which translates into irrecoverable and, in turn, wasted execution time, money, and computational resources. To address this issue, we devise a novel and efficient method to redeem past BLAST searches by introducing iBLAST. iBLAST leverages previous BLAST search results to conduct the same query search but only on the incremental (i.e., newly added) part of the database, recomputes the associated critical statistics such as e-values, and combines these results to produce updated search results. Our experimental results and fidelity analyses show that iBLAST delivers search results that are identical to NCBI BLAST at a substantially reduced computational cost, i.e., iBLAST performs (1 + δ)/δ times faster than NCBI BLAST, where δ represents the fraction of database growth. We then present three different use cases to demonstrate that iBLAST can enable efficient biological discovery at a much faster speed with a substantially reduced computational cost.

摘要

搜索工具的局部比对搜索结果使用的统计分数对数据库的大小敏感,用于报告结果的质量。例如,NCBI BLAST 使用相似性分数和针对数据库大小计算的预期值(即 e 值)报告最佳匹配。由于基因组研究中基因组学数据的飞速增长,序列数据库随着新序列不断添加到这些数据库中而增长。因此,对于特定查询集的结果(例如最佳命中)和相关统计信息(例如 e 值)可能会在基因组研究过程中发生变化。因此,为了更新先前进行的 BLAST 搜索的结果,以在更新的数据库上找到最佳匹配,科学家目前必须针对整个更新的数据库重新运行 BLAST 搜索,这意味着不可恢复,并且反过来又浪费了执行时间、金钱和计算资源。为了解决这个问题,我们设计了一种新颖而有效的方法来通过引入 iBLAST 来赎回过去的 BLAST 搜索。iBLAST 利用以前的 BLAST 搜索结果来执行相同的查询搜索,但仅在数据库的增量(即新添加的部分)上进行,重新计算相关的关键统计信息,如 e 值,并将这些结果组合起来以生成更新的搜索结果。我们的实验结果和保真度分析表明,iBLAST 以大大降低的计算成本提供与 NCBI BLAST 相同的搜索结果,即 iBLAST 的执行速度比 NCBI BLAST 快 (1 + δ)/δ 倍,其中 δ 表示数据库增长的分数。然后,我们提出了三个不同的用例,以证明 iBLAST 可以以更低的计算成本实现更快的生物发现效率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f598/8062096/817283410f4e/pone.0249410.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f598/8062096/52c992332ead/pone.0249410.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f598/8062096/7f9bc3d8313a/pone.0249410.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f598/8062096/cfe54fbbe5b7/pone.0249410.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f598/8062096/817283410f4e/pone.0249410.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f598/8062096/52c992332ead/pone.0249410.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f598/8062096/7f9bc3d8313a/pone.0249410.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f598/8062096/cfe54fbbe5b7/pone.0249410.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f598/8062096/817283410f4e/pone.0249410.g004.jpg

相似文献

1
iBLAST: Incremental BLAST of new sequences via automated e-value correction.iBLAST:通过自动 e 值校正对新序列进行增量 BLAST。
PLoS One. 2021 Apr 22;16(4):e0249410. doi: 10.1371/journal.pone.0249410. eCollection 2021.
2
muBLASTP: database-indexed protein sequence search on multicore CPUs.muBLASTP:基于多核CPU的数据库索引蛋白质序列搜索。
BMC Bioinformatics. 2016 Nov 4;17(1):443. doi: 10.1186/s12859-016-1302-4.
3
BLAST+: architecture and applications.BLAST+:体系结构与应用。
BMC Bioinformatics. 2009 Dec 15;10:421. doi: 10.1186/1471-2105-10-421.
4
Construction of customized sub-databases from NCBI-nr database for rapid annotation of huge metagenomic datasets using a combined BLAST and MEGAN approach.利用组合 BLAST 和 MEGAN 方法从 NCBI-nr 数据库构建定制子数据库,快速注释大量宏基因组数据集。
PLoS One. 2013;8(4):e59831. doi: 10.1371/journal.pone.0059831. Epub 2013 Apr 1.
5
Code optimization of the subroutine to remove near identical matches in the sequence database homology search tool PSI-BLAST.用于在序列数据库同源性搜索工具PSI-BLAST中去除近乎相同匹配项的子例程的代码优化。
J Comput Biol. 2010 Jun;17(6):819-23. doi: 10.1089/cmb.2008.0053.
6
Domain enhanced lookup time accelerated BLAST.基于域名的快速检索 BLAST。
Biol Direct. 2012 Apr 17;7:12. doi: 10.1186/1745-6150-7-12.
7
Mastering seeds for genomic size nucleotide BLAST searches.掌握用于基因组大小核苷酸BLAST搜索的种子
Nucleic Acids Res. 2003 Dec 1;31(23):6935-41. doi: 10.1093/nar/gkg886.
8
SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters.SS-Wrapper:用于在Linux集群上进行相似性搜索的一组包装应用程序。
BMC Bioinformatics. 2004 Oct 28;5:171. doi: 10.1186/1471-2105-5-171.
9
Div-BLAST: diversification of sequence search results.Div-BLAST:序列搜索结果的多样化
PLoS One. 2014 Dec 22;9(12):e115445. doi: 10.1371/journal.pone.0115445. eCollection 2014.
10
Recent Hits Acquired by BLAST (ReHAB): a tool to identify new hits in sequence similarity searches.通过BLAST获取的近期命中结果(ReHAB):一种在序列相似性搜索中识别新命中结果的工具。
BMC Bioinformatics. 2005 Feb 8;6:23. doi: 10.1186/1471-2105-6-23.

引用本文的文献

1
iSeqSearch: incremental protein search for iBlast/iMMSeqs2/iDiamond.iSeqSearch:用于iBlast/iMMSeqs2/iDiamond的增量蛋白质搜索
PeerJ. 2025 Apr 28;13:e19171. doi: 10.7717/peerj.19171. eCollection 2025.
2
Development, Design, and Application of Efficient siRNAs Against Cotton Leaf Curl Virus-Betasatellite Complex to Mediate Resistance Against Cotton Leaf Curl Disease.高效抗棉花曲叶病毒β卫星复合体的小干扰RNA的开发、设计及应用,以介导对棉花曲叶病的抗性
Indian J Microbiol. 2024 Jun;64(2):558-571. doi: 10.1007/s12088-024-01191-z. Epub 2024 Feb 3.
3
AlphaFun: Structural-Alignment-Based Proteome Annotation Reveals why the Functionally Unknown Proteins (uPE1) Are So Understudied.

本文引用的文献

1
Commonly misunderstood parameters of NCBI BLAST and important considerations for users.美国国立生物技术信息中心(NCBI)基本局部比对搜索工具(BLAST)中常见的误解参数及用户的重要注意事项。
Bioinformatics. 2019 Aug 1;35(15):2697-2698. doi: 10.1093/bioinformatics/bty1018.
2
Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows.美国国立医学图书馆生物信息学数据库(NCBI BLAST)中被误解的参数影响生物信息学工作流程的正确性。
Bioinformatics. 2019 May 1;35(9):1613-1614. doi: 10.1093/bioinformatics/bty833.
3
GenBank.GenBank。
AlphaFun:基于结构比对的蛋白质组注释揭示了功能未知蛋白(uPE1)为何研究不足。
J Proteome Res. 2024 May 3;23(5):1593-1602. doi: 10.1021/acs.jproteome.3c00678. Epub 2024 Apr 16.
4
Complet+: a computationally scalable method to improve completeness of large-scale protein sequence clustering.Complet+:一种可计算扩展的方法,用于提高大规模蛋白质序列聚类的完整性。
PeerJ. 2023 Feb 8;11:e14779. doi: 10.7717/peerj.14779. eCollection 2023.
Nucleic Acids Res. 2018 Jan 4;46(D1):D41-D47. doi: 10.1093/nar/gkx1094.
4
RefSeq: an update on prokaryotic genome annotation and curation.RefSeq:原核生物基因组注释和管理的最新进展。
Nucleic Acids Res. 2018 Jan 4;46(D1):D851-D860. doi: 10.1093/nar/gkx1068.
5
SparkBLAST: scalable BLAST processing using in-memory operations.SparkBLAST:使用内存操作的可扩展BLAST处理
BMC Bioinformatics. 2017 Jun 27;18(1):318. doi: 10.1186/s12859-017-1723-8.
6
Evolutionary History of the Hymenoptera.膜翅目昆虫的进化历史。
Curr Biol. 2017 Apr 3;27(7):1013-1018. doi: 10.1016/j.cub.2017.01.027. Epub 2017 Mar 23.
7
muBLASTP: database-indexed protein sequence search on multicore CPUs.muBLASTP:基于多核CPU的数据库索引蛋白质序列搜索。
BMC Bioinformatics. 2016 Nov 4;17(1):443. doi: 10.1186/s12859-016-1302-4.
8
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.美国国立生物技术信息中心的参考序列(RefSeq)数据库:当前状态、分类扩展及功能注释。
Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45. doi: 10.1093/nar/gkv1189. Epub 2015 Nov 8.
9
Big Data: Astronomical or Genomical?大数据:天文学的还是基因组学的?
PLoS Biol. 2015 Jul 7;13(7):e1002195. doi: 10.1371/journal.pbio.1002195. eCollection 2015 Jul.
10
Fast and sensitive protein alignment using DIAMOND.使用 DIAMOND 进行快速灵敏的蛋白质比对。
Nat Methods. 2015 Jan;12(1):59-60. doi: 10.1038/nmeth.3176. Epub 2014 Nov 17.