Dila Gopal, Michel Christian J, Poch Olivier, Ripp Raymond, Thompson Julie D
CSTB, ICube, CNRS, University of Strasbourg, 300 Boulevard Sébastien Brant, 67400, Illkirch, France.
Biosystems. 2019 Jan;175:57-74. doi: 10.1016/j.biosystems.2018.10.014. Epub 2018 Oct 24.
A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses (Michel, 2015, 2017; Arquès and Michel, 1996). This set X has an interesting mathematical property, since X is a maximal C self-complementary trinucleotide circular code (Arquès and Michel, 1996). Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the reading frame in genes. In a recent study of the X motifs in the complete genome of the yeast, Saccharomyces cerevisiae, it was shown that they are significantly enriched in the reading frame of the genes (protein-coding regions) of the genome (Michel et al., 2017). It was suggested that these X motifs may be evolutionary relics of a primitive code originally used for gene translation. The aim of this paper is to address two questions: are X motifs conserved during evolution? and do they continue to play a functional role in the processes of genome decoding and protein production? In a large scale analysis involving complete genomes from four mammals and nine different yeast species, we highlight specific evolutionary pressures on the X motifs in the genes of all the genomes, and identify important new properties of X motif conservation at the level of the encoded amino acids. We then compare the occurrence of X motifs with existing experimental data concerning protein expression and protein production, and report a significant correlation between the number of X motifs in a gene and increased protein abundance. In a general way, this work suggests that motifs from circular codes, i.e. motifs having the property of reading frame retrieval, may represent functional elements located within the coding regions of extant genomes.
已发现一组由20个三核苷酸组成的集合X,与细菌、古细菌、真核生物、质粒和病毒基因的两个移码阅读框相比,其在阅读框中的平均出现频率最高(Michel,2015年、2017年;Arquès和Michel,1996年)。集合X具有一个有趣的数学性质,因为X是一个最大的C自我互补三核苷酸循环码(Arquès和Michel,1996年)。此外,从这个循环码X获得的任何基序都有能力在基因中检索、维持和同步阅读框。在最近一项对酿酒酵母全基因组中X基序的研究中,发现它们在基因组的基因(蛋白质编码区)阅读框中显著富集(Michel等人,2017年)。有人提出,这些X基序可能是最初用于基因翻译的原始密码的进化遗迹。本文的目的是解决两个问题:X基序在进化过程中是否保守?它们在基因组解码和蛋白质产生过程中是否继续发挥功能作用?在一项涉及四种哺乳动物和九种不同酵母物种全基因组的大规模分析中,我们强调了所有基因组基因中X基序所面临的特定进化压力,并在编码氨基酸水平上确定了X基序保守性的重要新特性。然后,我们将X基序的出现情况与关于蛋白质表达和蛋白质产生的现有实验数据进行比较,并报告基因中X基序的数量与蛋白质丰度增加之间存在显著相关性。总体而言,这项工作表明,来自循环码的基序,即具有阅读框检索特性的基序,可能代表现存基因组编码区内的功能元件。