Sasidharan Rajkumar, Chothia Cyrus
Medical Research Council Laboratory of Molecular Biology, Hills Road, Cambridge, United Kingdom.
Proc Natl Acad Sci U S A. 2007 Jun 12;104(24):10080-5. doi: 10.1073/pnas.0703737104. Epub 2007 May 31.
We have determined the general constraints that govern sequence divergence in proteins that retain entirely, or very largely, the same structure and function. To do this we collected data from three different groups of orthologous sequences: those found in humans and mice, in humans and chickens, and in Escherichia coli and Salmonella enterica. In total, these organisms have 21,738 suitable pairs of orthologs, and these contain nearly 2 million mutations. The three groups differ greatly in the taxa from which they come and/or in the time that separates them from their last common ancestor. Nevertheless, the results we obtain from the three different groups are strikingly similar. For each group, the orthologous sequence pairs were assigned to six different divergence categories on the basis of their sequence identities. For categories with the same divergence, common accepted mutations have similar frequencies and rank orders in the three groups. With divergence, the width of the range of common mutations grows in the same manner in each group. We examined the distribution of mutations in protein structures. With increasing divergence, mutations increase at different rates in the buried, intermediate, and exposed regions of protein structures in a manner that explains the exponential relationship between the divergence of structure and sequence. This work implies that commonly allowed mutations are selected by a set of general constraints that are well defined and whose nature varies with divergence.
我们已经确定了控制蛋白质序列差异的一般限制条件,这些蛋白质完全或在很大程度上保留相同的结构和功能。为此,我们从三组不同的直系同源序列中收集数据:人类和小鼠中的序列、人类和鸡中的序列,以及大肠杆菌和肠炎沙门氏菌中的序列。这些生物体总共拥有21,738对合适的直系同源物,其中包含近200万个突变。这三组在它们所来自的分类群和/或与它们的最后一个共同祖先分开的时间上有很大差异。然而,我们从这三组不同数据中获得的结果惊人地相似。对于每组,直系同源序列对根据它们的序列同一性被分配到六个不同的差异类别中。对于具有相同差异的类别,在这三组中,常见的可接受突变具有相似的频率和排名顺序。随着差异增加,每组中常见突变范围的宽度以相同的方式增长。我们研究了蛋白质结构中突变的分布。随着差异增加,蛋白质结构的埋藏区域、中间区域和暴露区域中的突变以不同的速率增加,这种方式解释了结构差异与序列差异之间的指数关系。这项工作意味着,常见的允许突变是由一组定义明确且性质随差异而变化的一般限制条件选择的。