Department of Evolutionary Biology and Environmental Studies, University of Zürich, Switzerland.
Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland.
Genome Biol Evol. 2021 Dec 1;13(12). doi: 10.1093/gbe/evab273.
Mutations in DNA sequences that bind transcription factors and thus modulate gene expression are a source of adaptive variation in gene expression. To understand how transcription factor binding sequences evolve in natural populations of the thale cress Arabidopsis thaliana, we integrated genomic polymorphism data for loci bound by transcription factors with in vitro data on binding affinity for these transcription factors. Specifically, we studied 19 different transcription factors, and the allele frequencies of 8,333 genomic loci bound in vivo by these transcription factors in 1,135 A. thaliana accessions. We find that transcription factor binding sequences show very low genetic diversity, suggesting that they are subject to purifying selection. High frequency alleles of such binding sequences tend to bind transcription factors strongly. Conversely, alleles that are absent from the population tend to bind them weakly. In addition, alleles with high frequencies also tend to be the endpoints of many accessible evolutionary paths leading to these alleles. We show that both high affinity and high evolutionary accessibility contribute to high allele frequency for at least some transcription factors. Although binding sequences with stronger affinity are more frequent, we did not find them to be associated with higher gene expression levels. Epistatic interactions among individual mutations that alter binding affinity are pervasive and can help explain variation in accessibility among binding sequences. In summary, combining in vitro binding affinity data with in vivo binding sequence data can help understand the forces that affect the evolution of transcription factor binding sequences in natural populations.
DNA 序列突变会结合转录因子,从而调节基因表达,是基因表达适应性变异的一个来源。为了了解转录因子结合序列在拟南芥自然种群中的进化方式,我们整合了转录因子结合位点的基因组多态性数据和这些转录因子结合亲和力的体外数据。具体来说,我们研究了 19 种不同的转录因子,以及在 1135 个拟南芥品系中,这些转录因子在体内结合的 8333 个基因组位点的等位基因频率。我们发现转录因子结合序列的遗传多样性非常低,这表明它们受到了纯化选择的影响。这些结合序列的高频等位基因往往与转录因子结合紧密。相反,在种群中缺失的等位基因往往与它们结合较弱。此外,高频等位基因也往往是导致这些等位基因出现的许多可进化路径的终点。我们表明,至少对于某些转录因子而言,高亲和力和高进化可及性都有助于高频等位基因的出现。尽管具有较强亲和力的结合序列更频繁,但我们没有发现它们与更高的基因表达水平相关。改变结合亲和力的个体突变之间的上位性相互作用普遍存在,可以帮助解释结合序列之间的可及性变化。总之,将体外结合亲和力数据与体内结合序列数据相结合,可以帮助我们理解影响转录因子结合序列在自然种群中进化的力量。