Milner Centre for Evolution, University of Bath, Bath, United Kingdom.
Genome Biol Evol. 2022 Aug 3;14(8). doi: 10.1093/gbe/evac115.
Protein coding genes terminate with one of three stop codons (TAA, TGA, or TAG) that, like synonymous codons, are not employed equally. With TGA and TAG having identical nucleotide content, analysis of their differential usage provides an unusual window into the forces operating on what are ostensibly functionally identical residues. Across genomes and between isochores within the human genome, TGA usage increases with G + C content but, with a common G + C → A + T mutation bias, this cannot be explained by mutation bias-drift equilibrium. Increased usage of TGA in G + C-rich genomes or genomic regions is also unlikely to reflect selection for the optimal stop codon, as TAA appears to be universally optimal, probably because it has the lowest read-through rate. Despite TAA being favored by selection and mutation bias, as with codon usage bias G + C pressure is the prime determinant of between-species TGA usage trends. In species with strong G + C-biased gene conversion (gBGC), such as mammals and birds, the high usage and conservation of TGA is best explained by an A + T → G + C repair bias. How to explain TGA enrichment in other G + C-rich genomes is less clear. Enigmatically, across bacterial and archaeal species and between human isochores TAG usage is mostly unresponsive to G + C pressure. This unresponsiveness we dub the TAG paradox as currently no mutational, selective, or gBGC model provides a well-supported explanation. That TAG does increase with G + C usage across eukaryotes makes the usage elsewhere yet more enigmatic. We suggest resolution of the TAG paradox may provide insights into either an unknown but common selective preference (probably at the DNA/RNA level) or an unrecognized complexity to the action of gBGC.
蛋白质编码基因以三个终止密码子(TAA、TGA 或 TAG)之一终止,这些密码子与同义密码子一样,使用频率并不相同。由于 TGA 和 TAG 的核苷酸含量相同,因此分析它们的差异使用为研究表面上功能相同的残基所受的影响提供了一个独特的视角。在整个基因组以及人类基因组的同基因区内,TGA 的使用随着 G+C 含量的增加而增加,但由于存在常见的 G+C→A+T 突变偏向性,这不能用突变偏向-漂移平衡来解释。在 G+C 丰富的基因组或基因组区域中,TGA 使用频率的增加也不太可能反映出对最佳终止密码子的选择,因为 TAA 似乎是普遍最优的,可能是因为它的通读率最低。尽管 TAA 受到选择和突变偏向性的青睐,但与密码子使用偏向性一样,G+C 压力是物种间 TGA 使用趋势的主要决定因素。在具有强烈 G+C 偏向性基因转换(gBGC)的物种中,如哺乳动物和鸟类,TGA 的高使用和高保守性可以通过 A+T→G+C 修复偏向来解释。如何解释其他 G+C 丰富基因组中 TGA 的富集情况还不太清楚。令人费解的是,在细菌和古菌物种以及人类同基因区之间,TAG 的使用大多不受 G+C 压力的影响。我们将这种无反应性称为 TAG 悖论,因为目前没有突变、选择或 gBGC 模型能够提供有力的解释。TAG 在真核生物中确实随着 G+C 使用量的增加而增加,这使得其他地方的使用情况更加神秘。我们认为,解决 TAG 悖论可能有助于深入了解未知但常见的选择性偏好(可能在 DNA/RNA 水平上)或对 gBGC 作用的未被认识的复杂性。