• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

FastBLAST:数百万种蛋白质的同源关系。

FastBLAST: homology relationships for millions of proteins.

作者信息

Price Morgan N, Dehal Paramvir S, Arkin Adam P

机构信息

Physical Biosciences Divison, Lawrence Berkeley National Laboratory, Berkeley, California, USA.

出版信息

PLoS One. 2008;3(10):e3589. doi: 10.1371/journal.pone.0003589. Epub 2008 Oct 31.

DOI:10.1371/journal.pone.0003589
PMID:18974889
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2571987/
Abstract

BACKGROUND

All-versus-all BLAST, which searches for homologous pairs of sequences in a database of proteins, is used to identify potential orthologs, to find new protein families, and to provide rapid access to these homology relationships. As DNA sequencing accelerates and data sets grow, all-versus-all BLAST has become computationally demanding.

METHODOLOGY/PRINCIPAL FINDINGS: We present FastBLAST, a heuristic replacement for all-versus-all BLAST that relies on alignments of proteins to known families, obtained from tools such as PSI-BLAST and HMMer. FastBLAST avoids most of the work of all-versus-all BLAST by taking advantage of these alignments and by clustering similar sequences. FastBLAST runs in two stages: the first stage identifies additional families and aligns them, and the second stage quickly identifies the homologs of a query sequence, based on the alignments of the families, before generating pairwise alignments. On 6.53 million proteins from the non-redundant Genbank database ("NR"), FastBLAST identifies new families 25 times faster than all-versus-all BLAST. Once the first stage is completed, FastBLAST identifies homologs for the average query in less than 5 seconds (8.6 times faster than BLAST) and gives nearly identical results. For hits above 70 bits, FastBLAST identifies 98% of the top 3,250 hits per query.

CONCLUSIONS/SIGNIFICANCE: FastBLAST enables research groups that do not have supercomputers to analyze large protein sequence data sets. FastBLAST is open source software and is available at http://microbesonline.org/fastblast.

摘要

背景

全对全BLAST用于在蛋白质数据库中搜索同源序列对,以识别潜在的直系同源物、发现新的蛋白质家族,并快速获取这些同源关系。随着DNA测序加速和数据集增长,全对全BLAST的计算需求变得很高。

方法/主要发现:我们提出了FastBLAST,这是一种启发式方法,可替代全对全BLAST,它依赖于从PSI-BLAST和HMMer等工具获得的蛋白质与已知家族的比对。FastBLAST通过利用这些比对和对相似序列进行聚类,避免了全对全BLAST的大部分工作。FastBLAST分两个阶段运行:第一阶段识别额外的家族并进行比对,第二阶段在生成成对比对之前,根据家族比对快速识别查询序列的同源物。对于来自非冗余Genbank数据库(“NR”)的653万个蛋白质,FastBLAST识别新家族的速度比全对全BLAST快25倍。一旦第一阶段完成,FastBLAST在不到5秒的时间内就能识别出平均查询的同源物(比BLAST快8.6倍),并且给出几乎相同的结果。对于得分高于70比特的命中结果,FastBLAST能识别每个查询中排名前3250的命中结果中的98%。

结论/意义:FastBLAST使没有超级计算机的研究团队也能够分析大型蛋白质序列数据集。FastBLAST是开源软件,可从http://microbesonline.org/fastblast获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7342/2571987/88425714f45e/pone.0003589.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7342/2571987/9fdf5b68ba76/pone.0003589.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7342/2571987/88425714f45e/pone.0003589.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7342/2571987/9fdf5b68ba76/pone.0003589.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7342/2571987/88425714f45e/pone.0003589.g002.jpg

相似文献

1
FastBLAST: homology relationships for millions of proteins.FastBLAST:数百万种蛋白质的同源关系。
PLoS One. 2008;3(10):e3589. doi: 10.1371/journal.pone.0003589. Epub 2008 Oct 31.
2
SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection.SVM-HUSTLE——一种用于成对蛋白质远程同源性检测的迭代半监督机器学习方法。
Bioinformatics. 2008 Mar 15;24(6):783-90. doi: 10.1093/bioinformatics/btn028. Epub 2008 Feb 1.
3
PSIBLAST_PairwiseStatSig: reordering PSI-BLAST hits using pairwise statistical significance.PSI-BLAST成对统计显著性:使用成对统计显著性对PSI-BLAST命中结果进行重新排序。
Bioinformatics. 2009 Apr 15;25(8):1082-3. doi: 10.1093/bioinformatics/btp089. Epub 2009 Feb 27.
4
CLANS: a Java application for visualizing protein families based on pairwise similarity.CLANS:一个基于成对相似性可视化蛋白质家族的Java应用程序。
Bioinformatics. 2004 Dec 12;20(18):3702-4. doi: 10.1093/bioinformatics/bth444. Epub 2004 Jul 29.
5
Large-scale comparison of protein sequence alignment algorithms with structure alignments.蛋白质序列比对算法与结构比对的大规模比较。
Proteins. 2000 Jul 1;40(1):6-22. doi: 10.1002/(sici)1097-0134(20000701)40:1<6::aid-prot30>3.0.co;2-7.
6
A comparison of scoring functions for protein sequence profile alignment.蛋白质序列谱比对评分函数的比较
Bioinformatics. 2004 May 22;20(8):1301-8. doi: 10.1093/bioinformatics/bth090. Epub 2004 Feb 12.
7
Efficient recognition of protein fold at low sequence identity by conservative application of Psi-BLAST: validation.通过保守应用Psi-BLAST在低序列同一性下高效识别蛋白质折叠:验证
J Mol Recognit. 2005 Mar-Apr;18(2):139-49. doi: 10.1002/jmr.721.
8
Identification of homologs in insignificant blast hits by exploiting extrinsic gene properties.通过利用外在基因特性鉴定无显著比对结果的同源物。
BMC Bioinformatics. 2007 Sep 21;8:356. doi: 10.1186/1471-2105-8-356.
9
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
10
COMPASS server for remote homology inference.用于远程同源性推断的COMPASS服务器。
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W653-8. doi: 10.1093/nar/gkm293. Epub 2007 May 21.

引用本文的文献

1
A metagenomic alpha-diversity index for microbial functional biodiversity.一种用于微生物功能生物多样性的宏基因组 alpha 多样性指数。
FEMS Microbiol Ecol. 2024 Feb 14;100(3). doi: 10.1093/femsec/fiae019.
2
COGNAT: a web server for comparative analysis of genomic neighborhoods.COGNAT:一个用于基因组邻近区域比较分析的网络服务器。
Biol Direct. 2017 Nov 22;12(1):26. doi: 10.1186/s13062-017-0196-z.
3
Increased diversity of egg-associated bacteria on brown trout (Salmo trutta) at elevated temperatures.在高温下褐鳟(Salmo trutta)与卵相关细菌的多样性增加。

本文引用的文献

1
Orthologous transcription factors in bacteria have different functions and regulate different genes.细菌中的直系同源转录因子具有不同的功能并调控不同的基因。
PLoS Comput Biol. 2007 Sep;3(9):1739-50. doi: 10.1371/journal.pcbi.0030175.
2
UniRef: comprehensive and non-redundant UniProt reference clusters.UniRef:全面且无冗余的UniProt参考簇。
Bioinformatics. 2007 May 15;23(10):1282-8. doi: 10.1093/bioinformatics/btm098. Epub 2007 Mar 22.
3
The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.
Sci Rep. 2015 Nov 27;5:17084. doi: 10.1038/srep17084.
4
An automated graphics tool for comparative genomics: the Coulson plot generator.一种用于比较基因组学的自动化图形工具:考尔森绘图生成器。
BMC Bioinformatics. 2013 Apr 27;14:141. doi: 10.1186/1471-2105-14-141.
5
MetaMicrobesOnline: phylogenomic analysis of microbial communities.元微生物在线:微生物群落的系统发生基因组分析。
Nucleic Acids Res. 2013 Jan;41(Database issue):D648-54. doi: 10.1093/nar/gks1202. Epub 2012 Nov 30.
6
Evidence-based annotation of transcripts and proteins in the sulfate-reducing bacterium Desulfovibrio vulgaris Hildenborough.基于证据的硫酸盐还原菌脱硫弧菌 Hildenborough 中转录物和蛋白质的注释。
J Bacteriol. 2011 Oct;193(20):5716-27. doi: 10.1128/JB.05563-11. Epub 2011 Aug 12.
7
Telling the whole story in a 10,000-genome world.在一个拥有 10000 个基因组的世界里讲述完整的故事。
Biol Direct. 2011 Jun 30;6:34. doi: 10.1186/1745-6150-6-34.
8
Metagenomics: Facts and Artifacts, and Computational Challenges*.宏基因组学:事实与假象以及计算挑战*
J Comput Sci Technol. 2009 Jan;25(1):71-81. doi: 10.1007/s11390-010-9306-4.
9
MicrobesOnline: an integrated portal for comparative and functional genomics.微生物在线:一个用于比较和功能基因组学的综合门户。
Nucleic Acids Res. 2010 Jan;38(Database issue):D396-400. doi: 10.1093/nar/gkp919. Epub 2009 Nov 11.
10
Expression profiling of hypothetical genes in Desulfovibrio vulgaris leads to improved functional annotation.普通脱硫弧菌中假定基因的表达谱分析可改善功能注释。
Nucleic Acids Res. 2009 May;37(9):2926-39. doi: 10.1093/nar/gkp164. Epub 2009 Mar 17.
“魔法师二号”全球海洋采样考察:拓展蛋白质家族的范畴
PLoS Biol. 2007 Mar;5(3):e16. doi: 10.1371/journal.pbio.0050016.
4
New developments in the InterPro database.InterPro数据库的新进展。
Nucleic Acids Res. 2007 Jan;35(Database issue):D224-8. doi: 10.1093/nar/gkl841.
5
TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes.TIGRFAMs与基因组特性:用于确定原核生物基因组中分子功能和生物学过程的工具。
Nucleic Acids Res. 2007 Jan;35(Database issue):D260-4. doi: 10.1093/nar/gkl1043. Epub 2006 Dec 6.
6
The SUPERFAMILY database in 2007: families and functions.2007年的超家族数据库:家族与功能
Nucleic Acids Res. 2007 Jan;35(Database issue):D308-13. doi: 10.1093/nar/gkl910. Epub 2006 Nov 10.
7
A phylogenomic gene cluster resource: the Phylogenetically Inferred Groups (PhIGs) database.一种系统发育基因组基因簇资源:系统发育推断组(PhIGs)数据库。
BMC Bioinformatics. 2006 Apr 11;7:201. doi: 10.1186/1471-2105-7-201.
8
SMART 5: domains in the context of genomes and networks.SMART 5:基因组与网络背景下的结构域
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D257-60. doi: 10.1093/nar/gkj079.
9
Pfam: clans, web tools and services.蛋白质家族数据库(Pfam):家族分类、网络工具及服务
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D247-51. doi: 10.1093/nar/gkj149.
10
The PANTHER database of protein families, subfamilies, functions and pathways.蛋白质家族、亚家族、功能及通路的PANTHER数据库。
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D284-8. doi: 10.1093/nar/gki078.