Trifonov E N, Kirzhner A, Kirzhner V M, Berezovsky I N
Department of Structural Biology, The Weizman Institute of Science, Rehovot 76100, Israel.
J Mol Evol. 2001 Oct-Nov;53(4-5):394-401. doi: 10.1007/s002390010229.
Evolution of proteins encoded in nucleotide sequences began with the advent of the triplet code. The chronological order of the appearance of amino acids on the evolution scene and the steps in the evolution of the triplet code have been recently reconstructed (Trifonov, 2000b) on the basis of 40 different ranking criteria and hypotheses. According to the consensus chronology, the pair of complementary GGC and GCC codons for the amino acids alanine and glycine appeared first. Other codons appeared as complementary pairs as well, which divided their respective amino acids into two alphabets, encoded by triplets with either central purines or central pyrimidines: G, D, S, E, N, R, K, Q, C, H, Y, and W (Glycine alphabet G) and A, V, P, S, L, T, I, F, and M (Alanine alphabet A). It is speculated that the earliest polypeptide chains were very short, presumably of uniform length, belonging to two alphabet types encoded in the two complementary strands of the earliest mRNA duplexes. After the fusion of the minigenes, a mosaic of the alphabets would form. Traces of the predicted mosaic structure have been, indeed, detected in the protein sequences of complete prokaryotic genomes in the form of weak oscillations with the period 12 residues in the form of alteration of two types of 6 residue long units. The next stage of protein evolution corresponded to the closure of the chains in the loops of the size 25-30 residues (Berezovsky et al., 2000). Autocorrelation analysis of proteins of 23 complete archaebacterial and eubacterial genomes revealed that the preferred distances between valine, alanine, glycine, leucine, and isoleucine along the sequences are in the same range of 25-30 residues, indicating that the loops are primarily closed by hydrophobic interactions between the ends of the loops. The loop closure stage is followed by the formation of typical folds of 100-200 amino acids, via end-to-end fusion of the genes encoding the loop-size chains. This size was apparently dictated by the optimal ring closure for DNA. In both cases the closure into the ring (loop) rendered evolutionarily advantageous stability to the respective structures. Further gene fusions lead to the formation of modern multidomain proteins. Recombinational gene splicing is likely to have appeared after the DNA circularization stage.
核苷酸序列中编码的蛋白质的进化始于三联体密码的出现。基于40种不同的排序标准和假设,最近已经重建了氨基酸在进化过程中出现的时间顺序以及三联体密码的进化步骤(Trifonov,2000b)。根据一致的时间顺序,丙氨酸和甘氨酸的互补密码子对GGC和GCC首先出现。其他密码子也以互补对的形式出现,它们将各自的氨基酸分成两个字母表,由中心为嘌呤或嘧啶的三联体编码:G、D、S、E、N、R、K、Q、C、H、Y和W(甘氨酸字母表G)以及A、V、P、S、L、T、I、F和M(丙氨酸字母表A)。据推测,最早的多肽链非常短,大概长度一致,属于最早的mRNA双链体两条互补链中编码的两种字母表类型。小基因融合后,会形成字母表的镶嵌体。实际上,在完整原核生物基因组的蛋白质序列中已经检测到预测的镶嵌结构的痕迹,表现为以12个残基为周期的微弱振荡,形式为两种6个残基长的单元交替出现。蛋白质进化的下一阶段对应于25 - 30个残基大小的环中链的闭合(Berezovsky等人,2000)。对23个完整古细菌和真细菌基因组的蛋白质进行自相关分析表明,缬氨酸、丙氨酸、甘氨酸、亮氨酸和异亮氨酸沿序列的首选距离在25 - 30个残基的相同范围内,这表明环主要通过环末端之间的疏水相互作用闭合。环闭合阶段之后是通过编码环大小链的基因的端对端融合形成100 - 200个氨基酸的典型折叠。这个大小显然是由DNA的最佳环闭合决定的。在这两种情况下,形成环(环)都赋予了相应结构进化上有利的稳定性。进一步的基因融合导致现代多结构域蛋白质的形成。重组基因剪接可能在DNA环化阶段之后出现。