Suppr超能文献

用于多序列比对的BLAST和FASTA相似性搜索。

BLAST and FASTA similarity searching for multiple sequence alignment.

作者信息

Pearson William R

机构信息

Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, VA, USA.

出版信息

Methods Mol Biol. 2014;1079:75-101. doi: 10.1007/978-1-62703-646-7_5.

Abstract

BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.

摘要

BLAST、FASTA以及其他相似性搜索程序旨在基于序列相似性过高来识别同源蛋白质和DNA序列。如果两个序列的相似性远高于随机预期,那么这种过高相似性的最简单解释就是共同祖先——同源性。对于编码蛋白质的序列,最有效的相似性搜索是比较蛋白质序列而非DNA序列,并使用期望值而非序列一致性百分比来推断同源性。BLAST和FASTA序列比较程序包提供了将蛋白质和DNA序列与蛋白质数据库进行比较的程序(最灵敏的搜索)。将蛋白质和翻译后的DNA与蛋白质数据库进行比较通常能够追溯10亿到20亿年前的进化历程;而DNA:DNA搜索的灵敏度则要低5到10倍。BLAST和FASTA既可以在流行的网站上运行,也可以下载并安装到本地计算机上。通过本地安装,可以针对所分析的序列数据定制目标数据库。鉴于如今蛋白质数据库非常庞大,通过搜索较小的综合数据库,例如来自进化上相邻的模式生物的完整蛋白质组,也能够提高搜索灵敏度。默认情况下,BLAST和FASTA使用针对远缘进化关系的评分策略;对于涉及短结构域或查询的比较,或者寻找相对近缘同源物(如小鼠与人)的搜索,采用较浅的评分矩阵会更有效。BLAST和FASTA都提供非常准确的统计估计,可用于可靠地识别在20多亿年前就已分化的蛋白质序列。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验