Department of Applied Physics, Stanford University, Stanford, California, United States of America.
PLoS Genet. 2011 Feb;7(2):e1001315. doi: 10.1371/journal.pgen.1001315. Epub 2011 Feb 24.
Here we investigate the correlations between coding sequence substitutions as a function of their separation along the protein sequence. We consider both substitutions between the reference genomes of several Drosophilids as well as polymorphisms in a population sample of Zimbabwean Drosophila melanogaster. We find that amino acid substitutions are "clustered" along the protein sequence, that is, the frequency of additional substitutions is strongly enhanced within ≈10 residues of a first such substitution. No such clustering is observed for synonymous substitutions, supporting a "correlation length" associated with selection on proteins as the causative mechanism. Clustering is stronger between substitutions that arose in the same lineage than it is between substitutions that arose in different lineages. We consider several possible origins of clustering, concluding that epistasis (interactions between amino acids within a protein that affect function) and positional heterogeneity in the strength of purifying selection are primarily responsible. The role of epistasis is directly supported by the tendency of nearby substitutions that arose on the same lineage to preserve the total charge of the residues within the correlation length and by the preferential cosegregation of neighboring derived alleles in our population sample. We interpret the observed length scale of clustering as a statistical reflection of the functional locality (or modularity) of proteins: amino acids that are near each other on the protein backbone are more likely to contribute to, and collaborate toward, a common subfunction.
在这里,我们研究了编码序列取代作为其在蛋白质序列上的分离函数的相关性。我们既考虑了几个果蝇属的参考基因组之间的取代,也考虑了津巴布韦果蝇种群样本中的多态性。我们发现,氨基酸取代在蛋白质序列上是“聚集”的,也就是说,在第一个这样的取代的 ≈10 个残基内,额外取代的频率被强烈增强。同义取代没有观察到这种聚类,这支持了与蛋白质选择相关的“相关长度”作为因果机制。在同一谱系中产生的取代之间的聚类比在不同谱系中产生的取代之间的聚类更强。我们考虑了聚类的几种可能起源,得出结论,上位性(蛋白质内氨基酸之间影响功能的相互作用)和纯化选择强度的位置异质性是主要原因。邻近取代在同一谱系中产生的趋势保留了相关长度内残基的总电荷,以及在我们的群体样本中相邻衍生等位基因的优先共分离,直接支持了上位性的作用。我们将观察到的聚类长度尺度解释为蛋白质功能局部性(或模块性)的统计反映:蛋白质骨架上彼此靠近的氨基酸更有可能共同贡献和协作实现共同的子功能。