Qian B, Goldstein R A
Biophysics Research Division, University of Michigan, Ann Arbor, USA.
Proteins. 2001 Oct 1;45(1):102-4. doi: 10.1002/prot.1129.
Protein sequence alignment has become a widely used method in the study of newly sequenced proteins. Most sequence alignment methods use an affine gap penalty to assign scores to insertions and deletions. Although affine gap penalties represent the relative ease of extending a gap compared with initializing a gap, it is still an obvious oversimplification of the real processes that occur during sequence evolution. To improve the efficiency of sequence alignment methods and to obtain a better understanding of the process of sequence evolution, we wanted to find a more accurate model of insertions and deletions in homologous proteins. In this work, we extract the probability of a gap occurrence and the resulting gap length distribution in distantly related proteins (sequence identity < 25%) using alignments based on their common structures. We observe a distribution of gaps that can be fitted with a multiexponential with four distinct components. The results suggest new approaches to modeling insertions and deletions in sequence alignments.
蛋白质序列比对已成为新测序蛋白质研究中广泛使用的方法。大多数序列比对方法使用仿射空位罚分来为插入和缺失打分。尽管仿射空位罚分表示与起始一个空位相比扩展一个空位的相对难易程度,但它仍然是对序列进化过程中实际发生的过程的明显过度简化。为了提高序列比对方法的效率并更好地理解序列进化过程,我们希望找到一个更准确的同源蛋白质插入和缺失模型。在这项工作中,我们基于远缘相关蛋白质(序列同一性<25%)的共同结构,通过比对提取空位出现的概率和由此产生的空位长度分布。我们观察到一种空位分布,它可以用具有四个不同成分的多指数函数来拟合。这些结果为序列比对中插入和缺失的建模提出了新方法。