氨基酸序列比对：常用方法比较

Aligning amino acid sequences: comparison of commonly used methods.

作者信息

Feng D F, Johnson M S, Doolittle R F

出版信息

J Mol Evol. 1984;21(2):112-25. doi: 10.1007/BF02100085.

Abstract

We examined two extensive families of protein sequences using four different alignment schemes that employ various degrees of "weighting" in order to determine which approach is most sensitive in establishing relationships. All alignments used a similarity approach based on a general algorithm devised by Needleman and Wunsch. The approaches included a simple program, UM (unitary matrix), whereby only identities are scored; a scheme in which the genetic code is used as a basis for weighting (GC); another that employs a matrix based on structural similarity of amino acids taken together with the genetic basis of mutation (SG); and a fourth that uses the empirical log-odds matrix (LOM) developed by Dayhoff on the basis of observed amino acid replacements. The two sequence families examined were (a) nine different globins and (b) nine different tyrosine kinase-like proteins. It was assumed a priori that all members of a family share common ancestry. In cases where two sequences were more than 30% identical, alignments by all four methods were almost always the same. In cases where the percentage identity was less than 20%, however, there were often significant differences in the alignments. On the average, the Dayhoff LOM approach was the most effective in verifying distant relationships, as judged by an empirical "jumbling test." This was not universally the case, however, and in some instances the simple UM was actually as good or better. Trees constructed on the basis of the various alignments differed with regard to their limb lengths, but had essentially the same branching orders. We suggest some reasons for the different effectivenesses of the four approaches in the two different sequence settings, and offer some rules of thumb for assessing the significance of sequence relationships.

摘要

我们使用四种不同的比对方案研究了两个庞大的蛋白质序列家族，这些方案采用了不同程度的“加权”，以确定哪种方法在建立关系时最敏感。所有比对都使用了基于Needleman和Wunsch设计的通用算法的相似性方法。这些方法包括一个简单的程序，UM（单位矩阵），只对相同性进行评分；一种以遗传密码为加权基础的方案（GC）；另一种采用基于氨基酸结构相似性与突变遗传基础相结合的矩阵（SG）；以及第四种使用Dayhoff根据观察到的氨基酸替换情况开发的经验对数似然矩阵（LOM）。所研究的两个序列家族分别是：（a）九种不同的珠蛋白和（b）九种不同的酪氨酸激酶样蛋白。事先假定一个家族的所有成员都有共同的祖先。在两条序列的相同性超过30%的情况下，所有四种方法的比对结果几乎总是相同的。然而，在相同性百分比低于20%的情况下，比对结果往往存在显著差异。平均而言，根据经验性的“重排测试”判断，Dayhoff LOM方法在验证远缘关系方面最有效。然而，情况并非总是如此，在某些情况下，简单的UM实际上同样有效或更好。根据各种比对构建的树在分支长度方面有所不同，但基本分支顺序相同。我们提出了四种方法在两种不同序列背景下有效性不同的一些原因，并提供了一些评估序列关系重要性的经验法则。

相似文献

Aligning amino acid sequences: comparison of commonly used methods.

J Mol Evol. 1984;21(2):112-25. doi: 10.1007/BF02100085.

Using CLUSTAL for multiple sequence alignments.

Methods Enzymol. 1996;266:383-402. doi: 10.1016/s0076-6879(96)66024-8.

Progressive sequence alignment as a prerequisite to correct phylogenetic trees.

J Mol Evol. 1987;25(4):351-60. doi: 10.1007/BF02603120.

Profile analysis: detection of distantly related proteins.

Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355-8. doi: 10.1073/pnas.84.13.4355.

A method for the simultaneous alignment of three or more amino acid sequences.

J Mol Evol. 1986;23(3):267-78. doi: 10.1007/BF02115583.

Hidden Markov models of biological primary sequence information.

Proc Natl Acad Sci U S A. 1994 Feb 1;91(3):1059-63. doi: 10.1073/pnas.91.3.1059.

A method for detecting distant evolutionary relationships between protein or nucleic acid sequences in the presence of deletions or insertions.

J Mol Evol. 1978 Jun 20;11(2):143-61. doi: 10.1007/BF01733890.

A novel randomized iterative strategy for aligning multiple protein sequences.

Comput Appl Biosci. 1991 Oct;7(4):479-84. doi: 10.1093/bioinformatics/7.4.479.

An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited.

J Mol Biol. 1995 Jun 16;249(4):816-31. doi: 10.1006/jmbi.1995.0340.

Three-way Needleman--Wunsch algorithm.

Methods Enzymol. 1990;183:365-75. doi: 10.1016/0076-6879(90)83024-4.

引用本文的文献

Characterization on the oncogenic effect of the missense mutations of p53 via machine learning.

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad428.

Construction and characterization of an infectious cDNA clone of potato virus S developed from selected populations that survived genetic bottlenecks.

Virol J. 2019 Feb 6;16(1):18. doi: 10.1186/s12985-019-1124-x.

Guiding the humoral response against HIV-1 toward a MPER adjacent region by immunization with a VLP-formulated antibody-selected envelope variant.

PLoS One. 2018 Dec 19;13(12):e0208345. doi: 10.1371/journal.pone.0208345. eCollection 2018.

Differential Shape of Geminivirus Mutant Spectra Across Cultivated and Wild Hosts With Invariant Viral Consensus Sequences.

Front Plant Sci. 2018 Jul 2;9:932. doi: 10.3389/fpls.2018.00932. eCollection 2018.

Lethal mutagenesis of an RNA plant virus via lethal defection.

Sci Rep. 2018 Jan 23;8(1):1444. doi: 10.1038/s41598-018-19829-6.

IBBOMSA: An Improved Biogeography-based Approach for Multiple Sequence Alignment.

Evol Bioinform Online. 2016 Oct 27;12:237-246. doi: 10.4137/EBO.S40457. eCollection 2016.

Positive selection in the SLC11A1 gene in the family Equidae.

Immunogenetics. 2016 May;68(5):353-64. doi: 10.1007/s00251-016-0905-2. Epub 2016 Feb 4.

Identification and Characterization of a G Protein-binding Cluster in α7 Nicotinic Acetylcholine Receptors.

J Biol Chem. 2015 Aug 14;290(33):20060-70. doi: 10.1074/jbc.M115.647040. Epub 2015 Jun 18.

A statistical physics perspective on alignment-independent protein sequence comparison.

Bioinformatics. 2015 Aug 1;31(15):2469-74. doi: 10.1093/bioinformatics/btv167. Epub 2015 Mar 25.

The long and winding road of molecular data in phylogenetic analysis.

J Hist Biol. 2014 Fall;47(3):443-78.

本文引用的文献

Comparative biosequence metrics.

J Mol Evol. 1981;18(1):38-46. doi: 10.1007/BF01733210.

Science. 1981 Oct 9;214(4517):149-59. doi: 10.1126/science.7280687.

An improved algorithm for matching biological sequences.

J Mol Biol. 1982 Dec 15;162(3):705-8. doi: 10.1016/0022-2836(82)90398-9.

The amino acid sequence of a major polypeptide chain of earthworm hemoglobin.

J Biol Chem. 1982 Aug 10;257(15):9005-15.

An examination of the expected degree of sequence similarity that might arise in proteins that have converged to similar conformational states. The impact of such expectations on the search for homology between the structurally similar domains of rhodanese.

J Mol Biol. 1981 Sep 5;151(1):179-97. doi: 10.1016/0022-2836(81)90227-8.

Establishing homologies in protein sequences.

Methods Enzymol. 1983;91:524-45. doi: 10.1016/s0076-6879(83)91049-2.

Amino acid sequence of dimeric myoglobin from Cerithidea rhizophorarum.

Biochim Biophys Acta. 1983 May 30;745(1):32-6. doi: 10.1016/0167-4838(83)90166-8.

Nucleotide sequence of the feline retroviral oncogene v-fms shows unexpected homology with oncogenes encoding tyrosine-specific protein kinases.

Proc Natl Acad Sci U S A. 1984 Jan;81(1):85-9. doi: 10.1073/pnas.81.1.85.

Primary structure homology between the product of yeast cell division control gene CDC28 and vertebrate oncogenes.

Nature. 1984;307(5947):183-5. doi: 10.1038/307183a0.

Nucleotide sequence of v-rel: the oncogene of reticuloendotheliosis virus.

Proc Natl Acad Sci U S A. 1983 Oct;80(20):6229-33. doi: 10.1073/pnas.80.20.6229.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

氨基酸序列比对：常用方法比较

Aligning amino acid sequences: comparison of commonly used methods.

作者信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献