Moses Alan M, Chiang Derek Y, Kellis Manolis, Lander Eric S, Eisen Michael B
Graduate Group in Biophysics, University of California, Berkeley, CA 94720, USA.
BMC Evol Biol. 2003 Aug 28;3:19. doi: 10.1186/1471-2148-3-19.
The binding sites of sequence specific transcription factors are an important and relatively well-understood class of functional non-coding DNAs. Although a wide variety of experimental and computational methods have been developed to characterize transcription factor binding sites, they remain difficult to identify. Comparison of non-coding DNA from related species has shown considerable promise in identifying these functional non-coding sequences, even though relatively little is known about their evolution.
Here we analyse the genome sequences of the budding yeasts Saccharomyces cerevisiae, S. bayanus, S. paradoxus and S. mikatae to study the evolution of transcription factor binding sites. As expected, we find that both experimentally characterized and computationally predicted binding sites evolve slower than surrounding sequence, consistent with the hypothesis that they are under purifying selection. We also observe position-specific variation in the rate of evolution within binding sites. We find that the position-specific rate of evolution is positively correlated with degeneracy among binding sites within S. cerevisiae. We test theoretical predictions for the rate of evolution at positions where the base frequencies deviate from background due to purifying selection and find reasonable agreement with the observed rates of evolution. Finally, we show how the evolutionary characteristics of real binding motifs can be used to distinguish them from artefacts of computational motif finding algorithms.
As has been observed for protein sequences, the rate of evolution in transcription factor binding sites varies with position, suggesting that some regions are under stronger functional constraint than others. This variation likely reflects the varying importance of different positions in the formation of the protein-DNA complex. The characterization of the pattern of evolution in known binding sites will likely contribute to the effective use of comparative sequence data in the identification of transcription factor binding sites and is an important step toward understanding the evolution of functional non-coding DNA.
序列特异性转录因子的结合位点是一类重要且相对被充分理解的功能性非编码DNA。尽管已经开发了各种各样的实验和计算方法来表征转录因子结合位点,但它们仍然难以识别。来自相关物种的非编码DNA的比较在识别这些功能性非编码序列方面显示出了很大的前景,尽管对它们的进化了解相对较少。
在这里,我们分析了酿酒酵母、贝酵母、奇异酵母和米卡塔酵母的基因组序列,以研究转录因子结合位点的进化。正如预期的那样,我们发现实验表征和计算预测的结合位点的进化都比周围序列慢,这与它们受到纯化选择的假设一致。我们还观察到结合位点内进化速率的位置特异性变化。我们发现,酿酒酵母内结合位点的位置特异性进化速率与简并性呈正相关。我们测试了由于纯化选择导致碱基频率偏离背景的位置处进化速率的理论预测,并发现与观察到的进化速率有合理的一致性。最后,我们展示了真实结合基序的进化特征如何用于将它们与计算基序发现算法的伪像区分开来。
正如在蛋白质序列中观察到的那样,转录因子结合位点的进化速率随位置而变化,这表明某些区域比其他区域受到更强的功能约束。这种变化可能反映了蛋白质-DNA复合物形成中不同位置的不同重要性。已知结合位点进化模式的表征可能有助于在转录因子结合位点的识别中有效利用比较序列数据,并且是理解功能性非编码DNA进化的重要一步。