Di Giulio Massimo
Early Evolution of Life Laboratory, Institute of Biosciences and Bioresources, CNR, Via P. Castellino, 111, 80131, Naples, Italy,
J Mol Evol. 2014 Jun;78(6):313-20. doi: 10.1007/s00239-014-9626-z. Epub 2014 Jun 12.
The phylogenetic analyses as far as the identification of the number of domains of life is concerned have not reached a clear conclusion. In the attempt to improve this circumstance, I introduce the concept that the amino acids codified in the genetic code might be of markers with outstanding phylogenetic power. In particular, I hypothesise the existence of a biosphere populated, for instance, by three groups of organisms having different genetic codes because codifying at least a different amino acid. Evidently, these amino acids would mark the proteins that are present in the three groups of organisms in an unambiguous way. Therefore, in essence, this mark would not be other than the one that we usually try to make in the phylogenetic analyses in which we transform the protein sequences in phylogenetic trees, for the purpose to identify, for example, the domains of life. Indeed, this mark would allow to classify proteins without performing phylogenetic analyses because proteins belonging to a group of organisms would be recognisable as marked in a natural way by at least a different amino acid among the diverse groups of organisms. This conceptualisation answers the question of how many fundamental kinds of cells have evolved from the Last Universal Common Ancestor (LUCA), as the genetic code has unique proprieties that make the codified amino acids excellent phylogenetic markers. The presence of the formyl-methionine only in proteins of bacteria would mark them and would identify these as domain of life. On the other hand, the presence of pyrrolysine in the genetic code of the euryarchaeota would identify them such as another fundamental kind of cell evolved from the LUCA. Overall, the phylogenetic distribution of formyl-methionine and pyrrolysine would identify at least two domains of life--Bacteria and Archaea--but their number might be actually four; that is to say, Bacteria, Euryarchaeota, archeobacteria that are not euryarchaeota and Eukarya. The usually accepted domains of life represented by Bacteria, Archaea and Eukarya are not compatible with the phylogenetic distribution of these two amino acids and therefore this last classification might be mistaken.
就生命域数量的识别而言,系统发育分析尚未得出明确结论。为了改善这种情况,我引入了这样一个概念,即遗传密码中编码的氨基酸可能是具有强大系统发育能力的标记。特别是,我假设存在一个生物圈,例如,由三组具有不同遗传密码的生物体组成,因为它们至少编码一种不同的氨基酸。显然,这些氨基酸将以明确的方式标记这三组生物体中存在的蛋白质。因此,从本质上讲,这个标记与我们通常在系统发育分析中试图做出的标记并无不同,在系统发育分析中,我们将蛋白质序列转化为系统发育树,目的是识别例如生命域。事实上,这个标记将允许在不进行系统发育分析的情况下对蛋白质进行分类,因为属于一组生物体的蛋白质将通过不同生物体组中至少一种不同的氨基酸以自然的方式被识别为有标记的。这种概念化回答了从最后一个共同祖先(LUCA)进化出了多少种基本细胞类型的问题,因为遗传密码具有独特特性,使编码的氨基酸成为优秀的系统发育标记。仅在细菌蛋白质中存在的甲酰甲硫氨酸将标记它们,并将它们识别为生命域。另一方面,广古菌遗传密码中存在的吡咯赖氨酸将把它们识别为从LUCA进化而来的另一种基本细胞类型。总体而言,甲酰甲硫氨酸和吡咯赖氨酸的系统发育分布将识别至少两个生命域——细菌和古菌——但它们的数量实际上可能是四个;也就是说,细菌、广古菌、非广古菌的古细菌和真核生物。通常所接受的由细菌、古菌和真核生物代表的生命域与这两种氨基酸的系统发育分布不兼容,因此这种最后的分类可能是错误的。