• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种在所有进化距离上均敏感的蛋白质比对评分系统。

A protein alignment scoring system sensitive at all evolutionary distances.

作者信息

Altschul S F

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894.

出版信息

J Mol Evol. 1993 Mar;36(3):290-300. doi: 10.1007/BF00160485.

DOI:10.1007/BF00160485
PMID:8483166
Abstract

Protein sequence alignments generally are constructed with the aid of a "substitution matrix" that specifies a score for aligning each pair of amino acids. Assuming a simple random protein model, it can be shown that any such matrix, when used for evaluating variable-length local alignments, is implicitly a "log-odds" matrix, with a specific probability distribution for amino acid pairs to which it is uniquely tailored. Given a model of protein evolution from which such distributions may be derived, a substitution matrix adapted to detecting relationships at any chosen evolutionary distance can be constructed. Because in a database search it generally is not known a priori what evolutionary distances will characterize the similarities found, it is necessary to employ an appropriate range of matrices in order not to overlook potential homologies. This paper formalizes this concept by defining a scoring system that is sensitive at all detectable evolutionary distances. The statistical behavior of this scoring system is analyzed, and it is shown that for a typical protein database search, estimating the originally unknown evolutionary distance appropriate to each alignment costs slightly over two bits of information, or somewhat less than a factor of five in statistical significance. A much greater cost may be incurred, however, if only a single substitution matrix, corresponding to the wrong evolutionary distance, is employed.

摘要

蛋白质序列比对通常借助“替换矩阵”构建,该矩阵为每对氨基酸比对指定一个分数。假设一个简单的随机蛋白质模型,可以证明,任何这样的矩阵,当用于评估可变长度的局部比对时,隐含地是一个“对数优势”矩阵,具有特定的氨基酸对概率分布,它是为该分布量身定制的。给定一个可从中推导此类分布的蛋白质进化模型,就可以构建一个适用于检测任何选定进化距离处关系的替换矩阵。因为在数据库搜索中,通常事先不知道哪些进化距离将表征所发现的相似性,所以有必要使用适当范围的矩阵,以免忽略潜在的同源性。本文通过定义一个在所有可检测进化距离上都敏感的评分系统,将这一概念形式化。分析了该评分系统的统计行为,结果表明,对于典型的蛋白质数据库搜索,估计适合每个比对的原本未知的进化距离,大约需要略多于两位的信息,或者说在统计显著性上略小于五倍的系数。然而,如果只使用一个对应错误进化距离的单一替换矩阵,可能会产生大得多的代价。

相似文献

1
A protein alignment scoring system sensitive at all evolutionary distances.一种在所有进化距离上均敏感的蛋白质比对评分系统。
J Mol Evol. 1993 Mar;36(3):290-300. doi: 10.1007/BF00160485.
2
Scoredist: a simple and robust protein sequence distance estimator.Scoredist:一种简单且强大的蛋白质序列距离估计器。
BMC Bioinformatics. 2005 Apr 27;6:108. doi: 10.1186/1471-2105-6-108.
3
Amino acid substitution matrices from an information theoretic perspective.从信息论角度看氨基酸替换矩阵。
J Mol Biol. 1991 Jun 5;219(3):555-65. doi: 10.1016/0022-2836(91)90193-a.
4
Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix.使用进化速率结合氨基酸替换矩阵进行稳健的序列比对。
BMC Bioinformatics. 2015 Aug 14;16:255. doi: 10.1186/s12859-015-0688-8.
5
Sequence alignment with an appropriate substitution matrix.使用合适的替换矩阵进行序列比对。
J Comput Biol. 2008 Mar;15(2):129-38. doi: 10.1089/cmb.2007.0155.
6
Amino acid substitution matrices from an artificial neural network model.来自人工神经网络模型的氨基酸替换矩阵。
J Comput Biol. 2001;8(5):471-81. doi: 10.1089/106652701753216495.
7
The ranging of amino acids substitution matrices of various types in accordance with the alignment accuracy criterion.根据比对准确性标准对各种类型氨基酸替换矩阵进行排序。
BMC Bioinformatics. 2020 Sep 14;21(Suppl 11):294. doi: 10.1186/s12859-020-03616-0.
8
A transition probability model for amino acid substitutions from blocks.一种基于模块的氨基酸替换转换概率模型。
J Comput Biol. 2003;10(6):997-1010. doi: 10.1089/106652703322756195.
9
Amino acid similarity matrix for homology modeling derived from structural alignment and optimized by the Monte Carlo method.用于同源建模的氨基酸相似性矩阵,通过结构比对获得并经蒙特卡罗方法优化。
J Mol Graph Model. 1998 Aug-Dec;16(4-6):178-89, 254. doi: 10.1016/s1093-3263(98)80002-8.
10
ProClust: improved clustering of protein sequences with an extended graph-based approach.ProClust:基于扩展的图形方法改进蛋白质序列聚类
Bioinformatics. 2002;18 Suppl 2:S182-91. doi: 10.1093/bioinformatics/18.suppl_2.s182.

引用本文的文献

1
Genome sequence of cluster F1 phage Fastidio.F1簇噬菌体Fastidio的基因组序列
Microbiol Resour Announc. 2025 Apr 10;14(4):e0014825. doi: 10.1128/mra.00148-25. Epub 2025 Mar 19.
2
Computational Methods for the Discovery and Optimization of TAAR1 and TAAR5 Ligands.TAAR1 和 TAAR5 配体的发现和优化的计算方法。
Int J Mol Sci. 2024 Jul 27;25(15):8226. doi: 10.3390/ijms25158226.
3
Genome-wide identification and analysis of the cytokinin oxidase/dehydrogenase () gene family in finger millet ().龙爪稷中细胞分裂素氧化酶/脱氢酶()基因家族的全基因组鉴定与分析。

本文引用的文献

1
Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores.史密斯-沃特曼局部序列相似性得分统计分布的最大似然估计。
Bull Math Biol. 1992 Jan;54(1):59-75. doi: 10.1007/BF02458620.
2
Expression of a complete soybean leghemoglobin gene in root nodules of transgenic Lotus corniculatus.在转基因 Lotus corniculatus 的根瘤中表达完整的大豆根瘤血红蛋白基因。
Proc Natl Acad Sci U S A. 1987 Aug;84(16):5754-7. doi: 10.1073/pnas.84.16.5754.
3
Identification of common molecular subsequences.
Front Genet. 2022 Sep 27;13:963789. doi: 10.3389/fgene.2022.963789. eCollection 2022.
4
Cophylogeny and convergence shape holobiont evolution in sponge-microbe symbioses.共进化和趋同塑造了海绵-微生物共生体中的真核生物进化。
Nat Ecol Evol. 2022 Jun;6(6):750-762. doi: 10.1038/s41559-022-01712-3. Epub 2022 Apr 7.
5
Canine Melanoma Immunology and Immunotherapy: Relevance of Translational Research.犬黑色素瘤免疫学与免疫疗法:转化研究的相关性
Front Vet Sci. 2022 Feb 11;9:803093. doi: 10.3389/fvets.2022.803093. eCollection 2022.
6
ABCD1 and X-linked adrenoleukodystrophy: A disease with a markedly variable phenotype showing conserved neurobiology in animal models.ABCD1 和 X 连锁肾上腺脑白质营养不良:一种表型明显多变的疾病,在动物模型中具有保守的神经生物学特征。
J Neurosci Res. 2021 Dec;99(12):3170-3181. doi: 10.1002/jnr.24953. Epub 2021 Oct 29.
7
Isoelectric point region pI≈7.4 as a treasure island of abnormal proteoforms in blood.等电点区域pI≈7.4作为血液中异常蛋白质变体的“宝岛”。
Discoveries (Craiova). 2016 Dec 1;4(4):e67. doi: 10.15190/d.2016.14.
8
ProbPFP: a multiple sequence alignment algorithm combining hidden Markov model optimized by particle swarm optimization with partition function.ProbPFP:一种通过粒子群优化算法优化的隐马尔可夫模型与分区函数相结合的多序列比对算法。
BMC Bioinformatics. 2019 Nov 25;20(Suppl 18):573. doi: 10.1186/s12859-019-3132-7.
9
Diversification of defensins and NLRs in Arabidopsis species by different evolutionary mechanisms.拟南芥物种中防御素和NLRs通过不同进化机制实现多样化。
BMC Evol Biol. 2017 Dec 15;17(1):255. doi: 10.1186/s12862-017-1099-4.
10
Leigh map: A novel computational diagnostic resource for mitochondrial disease.Leigh图谱:一种用于线粒体疾病的新型计算诊断资源。
Ann Neurol. 2017 Jan;81(1):9-16. doi: 10.1002/ana.24835.
常见分子子序列的鉴定
J Mol Biol. 1981 Mar 25;147(1):195-7. doi: 10.1016/0022-2836(81)90087-5.
4
Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetries.核酸序列中的模式识别。I. 寻找局部同源性和对称性的通用方法。
Nucleic Acids Res. 1982 Jan 11;10(1):247-63. doi: 10.1093/nar/10.1.247.
5
The primary structures of two leghemoglobin genes from soybean.来自大豆的两个豆血红蛋白基因的一级结构。
Nucleic Acids Res. 1982 Jan 22;10(2):689-701. doi: 10.1093/nar/10.2.689.
6
Aligning amino acid sequences: comparison of commonly used methods.氨基酸序列比对:常用方法比较
J Mol Evol. 1984;21(2):112-25. doi: 10.1007/BF02100085.
7
A general method applicable to the search for similarities in the amino acid sequence of two proteins.一种适用于寻找两种蛋白质氨基酸序列相似性的通用方法。
J Mol Biol. 1970 Mar;48(3):443-53. doi: 10.1016/0022-2836(70)90057-4.
8
Tests for comparing related amino-acid sequences. Cytochrome c and cytochrome c 551 .用于比较相关氨基酸序列的测试。细胞色素c和细胞色素c 551 。
J Mol Biol. 1971 Oct 28;61(2):409-24. doi: 10.1016/0022-2836(71)90390-1.
9
The statistical distribution of nucleic acid similarities.核酸相似性的统计分布。
Nucleic Acids Res. 1985 Jan 25;13(2):645-56. doi: 10.1093/nar/13.2.645.
10
On the PAM matrix model of protein evolution.关于蛋白质进化的PAM矩阵模型。
Mol Biol Evol. 1985 Sep;2(5):434-47. doi: 10.1093/oxfordjournals.molbev.a040360.