Graduate program in Bioinformatics, North Carolina State University, Raleigh, NC 27695-7566, USA.
BMC Genomics. 2011 Aug 16;12:415. doi: 10.1186/1471-2164-12-415.
Protein sequences are subject to a mosaic of constraint. Changes to functional domains and buried residues, for example, are more apt to disrupt protein structure and function than are changes to residues participating in loops or exposed to solvent. Regions of constraint on the tertiary structure of a protein often result in loose segmentation of its primary structure into stretches of slowly- and rapidly-evolving amino acids. This clustering can be exploited, and existing methods have done so by relying on local sequence conservation as a signature of selection to help identify functionally important regions within proteins. We invert this paradigm by leveraging the regional nature of protein structure and function to both illuminate and make use of genome-wide patterns of local sequence conservation.
Our hypothesis is that the regional nature of structural and functional constraints will assert a positive autocorrelation on the evolutionary rates of neighboring sites, which, in a pairwise comparison of orthologous proteins, will manifest itself as the clustering of non-synonymous changes across the amino acid sequence. We introduce a dispersion ratio statistic to test this and related hypotheses. Using genome-wide interspecific comparisons of orthologous protein pairs, we reveal a strong log-linear relationship between the degree of clustering and the intensity of constraint. We further demonstrate how this relationship varies with the evolutionary distance between the species being compared. We provide some evidence that proteins with a history of positive selection deviate from genome-wide trends.
We find a significant association between the evolutionary rate of a protein and the degree to which non-synonymous changes cluster along its primary sequence. We show that clustering is a non-redundant predictor of evolutionary rate, and we speculate that conflicting signals of clustering and constraint may be indicative of a historical period of relaxed selection.
蛋白质序列受到镶嵌约束。例如,与参与环或暴露于溶剂的残基相比,功能域和埋藏残基的变化更有可能破坏蛋白质结构和功能。蛋白质三级结构上的约束区域通常导致其一级结构松散地划分为缓慢进化和快速进化的氨基酸片段。这种聚类可以被利用,现有的方法就是通过依赖局部序列保守性作为选择的特征来帮助识别蛋白质内的功能重要区域。我们通过利用蛋白质结构和功能的区域性质来反转这一范例,既阐明又利用基因组范围内的局部序列保守性的模式。
我们的假设是,结构和功能约束的区域性质将对相邻位点的进化率产生积极的自相关性,在同源蛋白质的成对比较中,这种自相关性将表现为非同义突变在氨基酸序列中的聚类。我们引入了一个分散比统计来检验这个和相关的假设。通过对同源蛋白质对的全基因组种间比较,我们揭示了聚类程度与约束强度之间的强烈对数线性关系。我们进一步证明了这种关系如何随比较的物种之间的进化距离而变化。我们提供了一些证据表明,具有正选择历史的蛋白质偏离了全基因组的趋势。
我们发现蛋白质的进化率与其一级序列上非同义突变聚类程度之间存在显著的相关性。我们表明聚类是进化率的一个非冗余预测因子,我们推测聚类和约束的冲突信号可能表明存在一个历史上放松选择的时期。