Fournier G P, Alm E J
Department of Earth, Atmospheric, and Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA,
J Mol Evol. 2015 Apr;80(3-4):171-85. doi: 10.1007/s00239-015-9672-1. Epub 2015 Mar 20.
The genetic code was likely complete in its current form by the time of the last universal common ancestor (LUCA). Several scenarios have been proposed for explaining the code's pre-LUCA emergence and expansion, and the relative order of the appearance of amino acids used in translation. One co-evolutionary model of genetic code expansion proposes that at least some amino acids were added to the code by the ancient divergence of aminoacyl-tRNA synthetase (aaRS) families. Of all the amino acids used within the genetic code, Trp is most frequently claimed as a relatively recent addition. We observe that, since TrpRS and TyrRS are paralogous protein families retaining significant sequence similarity, the inferred sequence composition of their ancestor can be used to evaluate this co-evolutionary model of genetic code expansion. We show that ancestral sequence reconstructions of the pre-LUCA paralog ancestor of TyrRS and TrpRS have several sites containing Tyr, yet a complete absence of sites containing Trp. This is consistent with the paralog ancestor being specific for the utilization of Tyr, with Trp being a subsequent addition to the genetic code facilitated by a process of aaRS divergence and neofunctionalization. Only after this divergence could Trp be specifically encoded and incorporated into proteins, including the TyrRS and TrpRS descendant lineages themselves. This early absence of Trp is observed under both homogeneous and non-homogeneous models of ancestral sequence reconstruction. Simulations support that this observed absence of Trp is unlikely to be due to chance or model bias. These results support that the final stages of genetic code evolution occurred well within the "protein world," and that the presence-absence of Trp within conserved sites of ancient protein domains is a likely measure of their relative antiquity, permitting the relative timing of extremely early events within protein evolution before LUCA.
在最后一个普遍共同祖先(LUCA)出现之时,遗传密码可能就已经以其当前的形式完备了。人们提出了几种设想来解释遗传密码在LUCA出现之前的起源与扩展,以及翻译过程中所使用氨基酸出现的相对顺序。一种遗传密码扩展的共同进化模型提出,至少有一些氨基酸是通过氨酰-tRNA合成酶(aaRS)家族的古老分化而添加到密码中的。在遗传密码所使用的所有氨基酸中,色氨酸(Trp)最常被认为是相对较晚才添加进去的。我们观察到,由于色氨酸-tRNA合成酶(TrpRS)和酪氨酸-tRNA合成酶(TyrRS)是保留了显著序列相似性的旁系同源蛋白家族,因此可以利用它们祖先的推断序列组成来评估这种遗传密码扩展的共同进化模型。我们表明,对TyrRS和TrpRS的LUCA之前的旁系同源祖先进行的祖先序列重建显示,有几个位点含有酪氨酸,但完全没有含有色氨酸的位点。这与旁系同源祖先专门用于利用酪氨酸是一致的,色氨酸是后来通过aaRS分化和新功能化过程添加到遗传密码中的。只有在这种分化之后,色氨酸才能被特异性编码并掺入蛋白质中,包括TyrRS和TrpRS的后代谱系本身。在祖先序列重建的均匀和非均匀模型下都观察到了色氨酸在早期的缺失。模拟结果支持,观察到的色氨酸缺失不太可能是由于偶然或模型偏差。这些结果支持遗传密码进化的最后阶段是在“蛋白质世界”中充分发生的,并且古代蛋白质结构域保守位点中色氨酸的有无可能是衡量它们相对古老程度的一个指标,从而可以确定在LUCA之前蛋白质进化中极早期事件发生的相对时间。