Tuğrul Murat, Paixão Tiago, Barton Nicholas H, Tkačik Gašper
Institute of Science and Technology Austria, Klosterneuburg, Austria.
PLoS Genet. 2015 Nov 6;11(11):e1005639. doi: 10.1371/journal.pgen.1005639. eCollection 2015 Nov.
Evolution of gene regulation is crucial for our understanding of the phenotypic differences between species, populations and individuals. Sequence-specific binding of transcription factors to the regulatory regions on the DNA is a key regulatory mechanism that determines gene expression and hence heritable phenotypic variation. We use a biophysical model for directional selection on gene expression to estimate the rates of gain and loss of transcription factor binding sites (TFBS) in finite populations under both point and insertion/deletion mutations. Our results show that these rates are typically slow for a single TFBS in an isolated DNA region, unless the selection is extremely strong. These rates decrease drastically with increasing TFBS length or increasingly specific protein-DNA interactions, making the evolution of sites longer than ∼ 10 bp unlikely on typical eukaryotic speciation timescales. Similarly, evolution converges to the stationary distribution of binding sequences very slowly, making the equilibrium assumption questionable. The availability of longer regulatory sequences in which multiple binding sites can evolve simultaneously, the presence of "pre-sites" or partially decayed old sites in the initial sequence, and biophysical cooperativity between transcription factors, can all facilitate gain of TFBS and reconcile theoretical calculations with timescales inferred from comparative genomics.
基因调控的进化对于我们理解物种、种群和个体之间的表型差异至关重要。转录因子与DNA上调控区域的序列特异性结合是一种关键的调控机制,它决定基因表达,进而决定可遗传的表型变异。我们使用一个关于基因表达定向选择的生物物理模型,来估计在点突变和插入/缺失突变情况下,有限种群中转录因子结合位点(TFBS)的获得和丢失速率。我们的结果表明,在一个孤立的DNA区域中,单个TFBS的这些速率通常很慢,除非选择极其强烈。随着TFBS长度增加或蛋白质-DNA相互作用特异性增强,这些速率会急剧下降,这使得在典型的真核生物物种形成时间尺度上,长度超过约10个碱基对的位点进化不太可能。同样,进化非常缓慢地收敛到结合序列的稳定分布,这使得平衡假设存在疑问。更长的调控序列(其中多个结合位点可以同时进化)的可用性、初始序列中“前体位点”或部分衰退的旧位点的存在,以及转录因子之间的生物物理协同作用,都可以促进TFBS的获得,并使理论计算与从比较基因组学推断出的时间尺度相协调。