Equipe de Bioinformatique Théorique, FDBT, LSIIT UMR CNRS-ULP 7005, Université de Strasbourg, Pôle API, Boulevard Sébastien Brant, 67400 Illkirch, France.
J Theor Biol. 2010 May 21;264(2):613-22. doi: 10.1016/j.jtbi.2010.02.006. Epub 2010 Feb 11.
A circular code is a set of trinucleotides allowing the reading frames in genes to be retrieved locally, i.e. anywhere in genes and in particular without start codons, and automatically with a window of few nucleotides. In 1996, a common circular code, called X, was identified in large populations of eukaryotic and prokaryotic genes. Hence, it is believed to be an ancestral structural property of genes. A new computational approach based on comparative genomics is developed to identify essential molecular functions associated with circular codes. It is based on a quantitative and sensitive statistical method (FPTF) to identify three permuted trinucleotide sets in the three frames of genes, a flower automaton algorithm to determine if a trinucleotide set is a circular code or not, and an integrated Gene Ontology and Taxonomy (iGOT) database. By carrying out automatic circular code analyses on a huge number of gene populations where each population is associated with a particular molecular function, it identifies 266 gene populations having circular codes close to X. Surprisingly, their molecular functions include 98% of those covered by the essential genes of the DEG database (Database of Essential Genes). Furthermore, three trinucleotides GTG, AAG and GCG, replacing three trinucleotides of the code X and called "evolutionary" trinucleotides, significantly occur in these 266 gene populations. Finally, a new method developed to analyse and quantify the stability of a set of trinucleotides demonstrates that these evolutionary trinucleotides are associated with a significant increase of the stability of the common circular code X. Indeed, its stability increases from the 1502th rank to the 16th rank after the replacement of the three evolutionary trinucleotides among 9920 possible trinucleotide replacement sets.
环形密码是一组三核苷酸,允许在基因中局部检索阅读框架,即在基因中的任何位置,尤其是在没有起始密码子的情况下,并自动使用几个核苷酸的窗口。1996 年,在真核生物和原核生物基因的大量群体中发现了一种常见的环形密码,称为 X。因此,它被认为是基因的一种古老的结构特性。开发了一种基于比较基因组学的新计算方法来识别与环形密码相关的基本分子功能。它基于一种定量和敏感的统计方法(FPTF)来识别基因三个框架中的三个置换三核苷酸集,一种花自动机算法来确定三核苷酸集是否为环形密码,以及一个集成的基因本体论和分类学(iGOT)数据库。通过对与特定分子功能相关的大量基因群体进行自动环形密码分析,它确定了 266 个具有接近 X 的环形密码的基因群体。令人惊讶的是,它们的分子功能包括 DEG 数据库(必需基因数据库)中涵盖的 98%的必需基因。此外,三个三核苷酸 GTG、AAG 和 GCG,取代了密码 X 的三个三核苷酸,称为“进化”三核苷酸,在这 266 个基因群体中显著出现。最后,开发了一种新的方法来分析和量化一组三核苷酸的稳定性,证明这些进化三核苷酸与常见环形密码 X 的稳定性显著增加有关。事实上,在 9920 个可能的三核苷酸替换集中,在替换三个进化三核苷酸后,其稳定性从第 1502 位提高到第 16 位。