• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于多序列比对的BLAST和FASTA相似性搜索。

BLAST and FASTA similarity searching for multiple sequence alignment.

作者信息

Pearson William R

机构信息

Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, VA, USA.

出版信息

Methods Mol Biol. 2014;1079:75-101. doi: 10.1007/978-1-62703-646-7_5.

DOI:10.1007/978-1-62703-646-7_5
PMID:24170396
Abstract

BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.

摘要

BLAST、FASTA以及其他相似性搜索程序旨在基于序列相似性过高来识别同源蛋白质和DNA序列。如果两个序列的相似性远高于随机预期,那么这种过高相似性的最简单解释就是共同祖先——同源性。对于编码蛋白质的序列,最有效的相似性搜索是比较蛋白质序列而非DNA序列,并使用期望值而非序列一致性百分比来推断同源性。BLAST和FASTA序列比较程序包提供了将蛋白质和DNA序列与蛋白质数据库进行比较的程序(最灵敏的搜索)。将蛋白质和翻译后的DNA与蛋白质数据库进行比较通常能够追溯10亿到20亿年前的进化历程;而DNA:DNA搜索的灵敏度则要低5到10倍。BLAST和FASTA既可以在流行的网站上运行,也可以下载并安装到本地计算机上。通过本地安装,可以针对所分析的序列数据定制目标数据库。鉴于如今蛋白质数据库非常庞大,通过搜索较小的综合数据库,例如来自进化上相邻的模式生物的完整蛋白质组,也能够提高搜索灵敏度。默认情况下,BLAST和FASTA使用针对远缘进化关系的评分策略;对于涉及短结构域或查询的比较,或者寻找相对近缘同源物(如小鼠与人)的搜索,采用较浅的评分矩阵会更有效。BLAST和FASTA都提供非常准确的统计估计,可用于可靠地识别在20多亿年前就已分化的蛋白质序列。

相似文献

1
BLAST and FASTA similarity searching for multiple sequence alignment.用于多序列比对的BLAST和FASTA相似性搜索。
Methods Mol Biol. 2014;1079:75-101. doi: 10.1007/978-1-62703-646-7_5.
2
Flexible sequence similarity searching with the FASTA3 program package.使用FASTA3程序包进行灵活的序列相似性搜索。
Methods Mol Biol. 2000;132:185-219. doi: 10.1385/1-59259-192-2:185.
3
Selecting the Right Similarity-Scoring Matrix.选择合适的相似性评分矩阵。
Curr Protoc Bioinformatics. 2013;43:3.5.1-3.5.9. doi: 10.1002/0471250953.bi0305s43.
4
Database similarity searches.数据库相似性搜索。
Methods Mol Biol. 2008;484:361-78. doi: 10.1007/978-1-59745-398-1_24.
5
Finding protein and nucleotide similarities with FASTA.使用FASTA查找蛋白质和核苷酸的相似性。
Curr Protoc Bioinformatics. 2004 Feb;Chapter 3:Unit3.9. doi: 10.1002/0471250953.bi0309s04.
6
Finding Protein and Nucleotide Similarities with FASTA.使用FASTA查找蛋白质和核苷酸的相似性。
Curr Protoc Bioinformatics. 2016 Mar 24;53:3.9.1-3.9.25. doi: 10.1002/0471250953.bi0309s53.
7
Sensitivity and selectivity in protein similarity searches: a comparison of Smith-Waterman in hardware to BLAST and FASTA.蛋白质相似性搜索中的灵敏度与选择性:硬件实现的史密斯-沃特曼算法与BLAST和FASTA的比较
Genomics. 1996 Dec 1;38(2):179-91. doi: 10.1006/geno.1996.0614.
8
Effective protein sequence comparison.有效的蛋白质序列比较。
Methods Enzymol. 1996;266:227-58. doi: 10.1016/s0076-6879(96)66017-0.
9
Computing multiple sequence/structure alignments with the T-coffee package.使用T-coffee软件包计算多序列/结构比对
Curr Protoc Bioinformatics. 2004 Feb;Chapter 3:Unit3.8. doi: 10.1002/0471250953.bi0308s04.
10
Adjusting scoring matrices to correct overextended alignments.调整评分矩阵以纠正过度延伸的比对。
Bioinformatics. 2013 Dec 1;29(23):3007-13. doi: 10.1093/bioinformatics/btt517. Epub 2013 Aug 31.

引用本文的文献

1
Bioinformatics Goes Viral: I. Databases, Phylogenetics and Phylodynamics Tools for Boosting Virus Research.生物信息学病毒学:I. 数据库、系统发生学和系统进化动力学工具,助力病毒研究。
Viruses. 2024 Sep 6;16(9):1425. doi: 10.3390/v16091425.
2
Analysis of a new phage, KZag1, infecting biofilm of Klebsiella pneumoniae: genome sequence and characterization.分析一种新的噬菌体 KZag1 感染肺炎克雷伯氏菌生物膜的情况:基因组序列和特征。
BMC Microbiol. 2024 Jun 14;24(1):211. doi: 10.1186/s12866-024-03355-9.
3
RAMZIS: a bioinformatic toolkit for rigorous assessment of the alterations to glycoprotein composition that occur during biological processes.
RAMZIS:一种生物信息学工具包,用于严格评估生物过程中发生的糖蛋白组成变化。
Bioinform Adv. 2024 Jan 25;4(1):vbae012. doi: 10.1093/bioadv/vbae012. eCollection 2024.
4
C2H2 Zinc Finger Transcription Factors Associated with Hemoglobinopathies.与血红蛋白病相关的 C2H2 锌指转录因子。
J Mol Biol. 2024 Apr 1;436(7):168343. doi: 10.1016/j.jmb.2023.168343. Epub 2023 Nov 2.
5
Creative destruction: New protein folds from old.创造性破坏:旧蛋白折叠成新结构。
Proc Natl Acad Sci U S A. 2022 Dec 27;119(52):e2207897119. doi: 10.1073/pnas.2207897119. Epub 2022 Dec 19.
6
A simple guide to de novo transcriptome assembly and annotation.从头转录组组装与注释简明指南。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab563.
7
Resurrecting Enzymes by Ancestral Sequence Reconstruction.通过祖先序列重建复活酶
Methods Mol Biol. 2022;2397:111-136. doi: 10.1007/978-1-0716-1826-4_7.
8
The fine art of preparing membrane transport proteins for biomolecular simulations: Concepts and practical considerations.膜转运蛋白用于生物分子模拟的精细艺术:概念和实际考虑。
Methods. 2021 Jan;185:3-14. doi: 10.1016/j.ymeth.2020.02.009. Epub 2020 Feb 17.
9
RareLSD: a manually curated database of lysosomal enzymes associated with rare diseases.罕见 LSD:一个与罕见疾病相关的溶酶体酶的人工 curated 数据库。
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz112.
10
Improvement of saffron production using as a bioinoculant under greenhouse conditions.在温室条件下使用[具体生物接种剂未给出]作为生物接种剂提高藏红花产量。
AIMS Microbiol. 2017 May 22;3(3):354-364. doi: 10.3934/microbiol.2017.3.354. eCollection 2017.