Suppr超能文献

由密码子使用情况识别的环形代码。

Circular code identified by the codon usage.

机构信息

Theoretical bioinformatics, ICube, University of Strasbourg, C.N.R.S., 300 Boulevard Sébastien Brant, 67400 Illkirch, France.

出版信息

Biosystems. 2024 Oct;244:105308. doi: 10.1016/j.biosystems.2024.105308. Epub 2024 Aug 17.

Abstract

Since 1996, circular codes in genes have been identified thanks to the development of 6 statistical approaches: trinucleotide frequencies per frame (Arquès and Michel, 1996), correlation functions per frame (Arquès and Michel, 1997), frame permuted trinucleotide frequencies (Frey and Michel, 2003, 2006), advanced statistical functions at the gene population level (Michel, 2015) and at the gene level (Michel, 2017). All these 3-frame statistical methods analyse the trinucleotide information in the 3 frames of genes: the reading frame and the 2 shifted frames. Notably, codon usage does not allow for the identification of circular codes (Michel, 2020). This has been a long-standing problem since 1996, hindering biologists' access to circular code theory. By considering circular code conditions resulting from code theory, particularly the concept of permutation class, and building upon previous statistical work, a new statistical approach based solely on the codon usage, i.e. a 1-frame statistical method, surprisingly reveals the maximal C self-complementary trinucleotide circular code X in bacterial genes and in average (bacterial, archaeal, eukaryotic) genes, and almost in archaeal genes. Additionally, a new parameter definition indicates that bacterial and archaeal genes exhibit codon usage dispersion of the same order of magnitude, but significantly higher than that observed in eukaryotic genes. This statistical finding may explain the greater variability of codes in eukaryotic genes compared to bacterial and archaeal genes, an issue that has been open for many years. Finally, biologists can now search for new (variant) circular codes at both the genome level (across all genes in a given genome) and the gene level using only codon usage, without the need for analysing the shifted frames.

摘要

自 1996 年以来,由于 6 种统计方法的发展,已经鉴定出基因中的环状码:每框三核苷酸频率(Arquès 和 Michel,1996 年)、每框相关函数(Arquès 和 Michel,1997 年)、框移三核苷酸频率(Frey 和 Michel,2003 年、2006 年)、基因群体水平的高级统计函数(Michel,2015 年)和基因水平的高级统计函数(Michel,2017 年)。所有这 3 种框架统计方法都分析了基因中 3 个框架的三核苷酸信息:阅读框架和 2 个移位框架。值得注意的是,密码子使用情况不允许识别环状码(Michel,2020 年)。自 1996 年以来,这一直是一个长期存在的问题,阻碍了生物学家对环状码理论的访问。通过考虑源于编码理论的环状码条件,特别是置换类的概念,并基于以前的统计工作,一种新的仅基于密码子使用情况的统计方法,即 1 框架统计方法,令人惊讶地揭示了细菌基因和平均(细菌、古细菌、真核生物)基因中最大的 C 自我互补三核苷酸环状码 X,几乎在古细菌基因中也是如此。此外,一个新的参数定义表明,细菌和古细菌基因的密码子使用分散度具有相同的数量级,但明显高于真核生物基因中观察到的分散度。这一统计发现可能解释了与细菌和古细菌基因相比,真核生物基因中的代码更具可变性,这是一个多年来悬而未决的问题。最后,生物学家现在可以仅使用密码子使用情况,在基因组水平(给定基因组中所有基因)和基因水平上搜索新的(变体)环状码,而无需分析移位框架。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验