Kosiol Carolin, Goldman Nick, Buttimore Nigel H
School of Mathematics, Trinity College, University of Dublin, Dublin 2, Ireland.
J Theor Biol. 2004 May 7;228(1):97-106. doi: 10.1016/j.jtbi.2003.12.010.
It is accepted that many evolutionary changes of amino acid sequence in proteins are conservative: the replacement of one amino acid by another residue has a far greater chance of being accepted if the two residues have similar properties. It is difficult, however, to identify relevant physicochemical properties that capture this similarity. In this paper we introduce a criterion that determines similarity from an evolutionary point of view. Our criterion is based on the description of protein evolution by a Markov process and the corresponding matrix of instantaneous replacement rates. It is inspired by the conductance, a quantity that reflects the strength of mixing in a Markov process. Furthermore we introduce a method to divide the 20 amino acid residues into subsets that achieve good scores with our criterion. The criterion has the time-invariance property that different time distances of the same amino acid replacement rate matrix lead to the same grouping; but different rate matrices lead to different groupings. Therefore it can be used as an automated method to compare matrices derived from consideration of different types of proteins, or from parts of proteins sharing different structural or functional features. We present the groupings resulting from two standard matrices used in sequence alignment and phylogenetic tree estimation.
人们普遍认为,蛋白质中氨基酸序列的许多进化变化是保守的:如果两个残基具有相似的性质,那么一个氨基酸被另一个残基取代的可能性要大得多。然而,很难确定能够捕捉这种相似性的相关物理化学性质。在本文中,我们引入了一种从进化角度确定相似性的标准。我们的标准基于用马尔可夫过程对蛋白质进化的描述以及相应的瞬时取代率矩阵。它的灵感来自于电导,电导是一个反映马尔可夫过程中混合强度的量。此外,我们还介绍了一种方法,将20种氨基酸残基划分为子集,这些子集用我们的标准能获得较好的分数。该标准具有时间不变性,即相同氨基酸取代率矩阵的不同时间距离会导致相同的分组;但不同的率矩阵会导致不同的分组。因此,它可以用作一种自动方法,来比较从考虑不同类型蛋白质或具有不同结构或功能特征的蛋白质部分得出的矩阵。我们展示了由序列比对和系统发育树估计中使用的两个标准矩阵所得到的分组。