Department of Biochemistry, University of Toronto, Toronto, Ontario, M5G1M1, Canada.
Department of Electrical Engineering and Computer Science, York University, Toronto, Ontario, M3J1P3, Canada.
Genome Biol. 2024 Aug 13;25(1):219. doi: 10.1186/s13059-024-03364-x.
In vertebrates, most protein-coding genes have a peak of GC-content near their 5' transcriptional start site (TSS). This feature promotes both the efficient nuclear export and translation of mRNAs. Despite the importance of GC-content for RNA metabolism, its general features, origin, and maintenance remain mysterious. We investigate the evolutionary forces shaping GC-content at the transcriptional start site (TSS) of genes through both comparative genomic analysis of nucleotide substitution rates between different species and by examining human de novo mutations.
Our data suggests that GC-peaks at TSSs were present in the last common ancestor of amniotes, and likely that of vertebrates. We observe that in apes and rodents, where recombination is directed away from TSSs by PRDM9, GC-content at the 5' end of protein-coding gene is currently undergoing mutational decay. In canids, which lack PRDM9 and perform recombination at TSSs, GC-content at the 5' end of protein-coding is increasing. We show that these patterns extend into the 5' end of the open reading frame, thus impacting synonymous codon position choices.
Our results indicate that the dynamics of this GC-peak in amniotes is largely shaped by historic patterns of recombination. Since decay of GC-content towards the mutation rate equilibrium is the default state for non-functional DNA, the observed decrease in GC-content at TSSs in apes and rodents indicates that the GC-peak is not being maintained by selection on most protein-coding genes in those species.
在脊椎动物中,大多数蛋白质编码基因在其 5'转录起始位点(TSS)附近有一个 GC 含量峰值。这个特征促进了 mRNA 的高效核输出和翻译。尽管 GC 含量对 RNA 代谢很重要,但它的一般特征、起源和维持仍然是神秘的。我们通过比较不同物种之间核苷酸取代率的基因组分析以及检查人类新生突变,研究了塑造基因转录起始位点(TSS)处 GC 含量的进化力量。
我们的数据表明,在羊膜动物和脊椎动物的最后共同祖先中,TSS 处存在 GC 峰。我们观察到,在灵长类动物和啮齿类动物中,由于 PRDM9 将重组引导远离 TSS,蛋白质编码基因 5'端的 GC 含量目前正在经历突变衰减。在缺乏 PRDM9 并在 TSS 处进行重组的犬科动物中,蛋白质编码基因 5'端的 GC 含量正在增加。我们表明,这些模式扩展到开放阅读框的 5'端,从而影响同义密码子位置的选择。
我们的结果表明,在羊膜动物中,这个 GC 峰的动态主要是由历史上的重组模式塑造的。由于向突变率平衡的 GC 含量衰减是无功能 DNA 的默认状态,在灵长类动物和啮齿类动物中 TSS 处 GC 含量的减少表明,在这些物种中的大多数蛋白质编码基因中,GC 峰不是通过选择来维持的。