Fimmel Elena, Strüngmann Lutz
Institute of Mathematical Biology, Faculty for Computer Sciences, Mannheim University of Applied Sciences, 68163 Mannheim, Germany.
Biosystems. 2023 Nov;233:105009. doi: 10.1016/j.biosystems.2023.105009. Epub 2023 Aug 26.
Nature possesses inherent mechanisms for error detection and correction during the translation of genetic information, as demonstrated by the discovery of a self-complementary circular C-code called X in various organisms such as bacteria, eukaryotes, plasmids, and viruses (Arquès and Michel, 1996; Michel, 2015, 2017). Since then, extensive research has focused on circular codes, which are believed to be remnants of ancient comma-free codes. These codes can be regarded as an additional genetic code specifically optimized for detecting and preserving the proper reading frame in protein-coding sequences. A study by Fimmel et al. in 2014 identified that a total of 216 maximal self-complementary C-codes can be grouped into 27 equivalence classes with eight codes in each class. In this work, we study how the 27 equivalence classes are related to each other. While the codes in each equivalence class obtained by Fimmel et al. in 2014 are permutations of each other, i.e. one code can be obtained from the other by applying a permutation of the bases, it has not been clear how the equvalence classes are connected. We show that there is an ordering of the equivalence classes such that one gets from one class to the next one by substituting only one pair of codon/anticodon in the corresponding codes, i.e. the corresponding codes have a maximal intersection of 18 codons. To perform this analysis, we define two graphs, G and G, whose vertices are, respectively, all 216 maximal self-complementary C-codes and 27 equivalence classes. Several properties of the graphs are obtained. Most surprisingly, it turns out that G contains Hamiltonian paths of length 27. This fact ultimately leads to a representation of the set of all 216 maximal self-complementary C-codes as a kind of spider web. Finally, we define dinucleotide cuts of such codes by projecting each codon to its first two bases and show that the paths of lengths 27 in G can even be chosen so that all the codes contain a special subset of dinucleotides defined by Rumer's roots. These observations raise a lot of new questions about the biological function of such structures.
在遗传信息翻译过程中,自然界拥有内在的错误检测和纠正机制,这一点已在多种生物体(如细菌、真核生物、质粒和病毒)中发现的一种名为X的自互补环状C码得到证明(阿尔凯斯和米歇尔,1996年;米歇尔,2015年、2017年)。从那时起,广泛的研究聚焦于环状码,人们认为它们是古代无逗号码的遗留物。这些码可被视为一种专门为检测和保留蛋白质编码序列中的正确阅读框而优化的额外遗传码。2014年菲默尔等人的一项研究确定,总共216个最大自互补C码可被分为27个等价类,每个类有8个码。在这项工作中,我们研究这27个等价类是如何相互关联的。虽然菲默尔等人在2014年获得的每个等价类中的码彼此是排列关系,即一个码可以通过对碱基进行排列从另一个码得到,但等价类是如何连接的尚不清楚。我们表明存在等价类的一种排序,使得从一个类到下一个类只需在相应的码中替换一对密码子/反密码子,即相应的码有18个密码子的最大交集。为了进行此分析,我们定义了两个图,G和G',其顶点分别是所有216个最大自互补C码和27个等价类。得到了这些图的几个性质。最令人惊讶的是,结果表明G包含长度为27的哈密顿路径。这一事实最终导致将所有216个最大自互补C码的集合表示为一种蜘蛛网的形式。最后,我们通过将每个密码子投影到其前两个碱基来定义此类码的二核苷酸切割,并表明甚至可以选择G中长度为27的路径,使得所有码都包含由鲁默根定义的二核苷酸的一个特殊子集。这些观察结果引发了许多关于此类结构生物学功能的新问题。