Suppr超能文献

细菌、真核生物、质粒和病毒基因中最大的C(3)自互补三核苷酸循环码X。

The maximal C(3) self-complementary trinucleotide circular code X in genes of bacteria, eukaryotes, plasmids and viruses.

作者信息

Michel Christian J

机构信息

Theoretical Bioinformatics, ICube, University of Strasbourg, CNRS, 300 Boulevard Sébastien Brant, 67400 Illkirch, France.

出版信息

J Theor Biol. 2015 Sep 7;380:156-77. doi: 10.1016/j.jtbi.2015.04.009. Epub 2015 Apr 29.

Abstract

In 1996, a set X of 20 trinucleotides is identified in genes of both prokaryotes and eukaryotes which has in average the highest occurrence in reading frame compared to the two shifted frames (Arquès and Michel, 1996). Furthermore, this set X has an interesting mathematical property as X is a maximal C(3) self-complementary trinucleotide circular code (Arquès and Michel, 1996). In 2014, the number of trinucleotides in prokaryotic genes has been multiplied by a factor of 527. Furthermore, two new gene kingdoms of plasmids and viruses contain enough trinucleotide data to be analysed. The approach used in 1996 for identifying a preferential frame for a trinucleotide is quantified here with a new definition analysing the occurrence probability of a complementary/permutation (CP) trinucleotide set in a gene kingdom. Furthermore, in order to increase the statistical significance of results compared to those of 1996, the circular code X is studied on several gene taxonomic groups in a kingdom. Based on this new statistical approach, the circular code X is strengthened in genes of prokaryotes and eukaryotes, and now also identified in genes of plasmids. A subset of X with 18 or 16 trinucleotides is identified in genes of viruses. Furthermore, a simple probabilistic model based on the independent occurrence of trinucleotides in reading frame of genes explains the circular code frequencies and asymmetries observed in the shifted frames in all studied gene kingdoms. Finally, the developed approach allows to identify variant X codes in genes, i.e. trinucleotide codes which differ from X. In genes of bacteria, eukaryotes and plasmids, 14 among the 47 studied gene taxonomic groups (about 30%) have variant X codes. Seven variant X codes are identified with at least 16 trinucleotides of X. Two variant X codes XA in cyanobacteria and plasmids of cyanobacteria, and XD in birds are self-complementary, without permuted trinucleotides but non-circular. Five variant X codes XB in deinococcus, plasmids of chloroflexi and deinococcus, mammals and kinetoplasts, XC in elusimicrobia and apicomplexans, XE in fishes, XF in insects, and XG in basidiomycetes and plasmids of spirochaetes are C(3) self-complementary circular. In genes of viruses, no variant X code is found.

摘要

1996年,在原核生物和真核生物的基因中鉴定出一组20个三核苷酸,与另外两个移码相比,该组三核苷酸在阅读框中的出现频率平均最高(阿尔凯斯和米歇尔,1996年)。此外,这组X具有一个有趣的数学性质,因为X是一个最大的C(3)自互补三核苷酸循环码(阿尔凯斯和米歇尔,1996年)。2014年,原核生物基因中的三核苷酸数量增加到了原来的527倍。此外,两个新的基因领域——质粒和病毒,拥有足够的三核苷酸数据可供分析。1996年用于鉴定三核苷酸优先阅读框的方法,在此通过一个新的定义进行量化,该定义分析了基因领域中互补/置换(CP)三核苷酸集的出现概率。此外,为了提高与1996年结果相比的统计显著性,在一个领域的几个基因分类组上研究了循环码X。基于这种新的统计方法,循环码X在原核生物和真核生物的基因中得到了强化,现在在质粒基因中也被鉴定出来。在病毒基因中鉴定出了一个由18或16个三核苷酸组成的X子集。此外,一个基于基因阅读框中三核苷酸独立出现的简单概率模型,解释了在所有研究的基因领域中,移码中观察到的循环码频率和不对称性。最后,所开发的方法能够识别基因中的变体X码,即与X不同的三核苷酸码。在细菌、真核生物和质粒的基因中,47个研究的基因分类组中有14个(约30%)具有变体X码。鉴定出了7个变体X码,它们至少包含16个X的三核苷酸。在蓝细菌和蓝细菌质粒中的两个变体X码XA,以及在鸟类中的XD,是自互补的,没有置换的三核苷酸,但不是循环的。在嗜热放线菌、绿屈挠菌和嗜热放线菌的质粒、哺乳动物和动质体中的5个变体X码XB,在迷踪菌和顶复门中的XC,在鱼类中的XE,在昆虫中的XF,以及在担子菌和螺旋体质粒中的XG,是C(3)自互补循环的。在病毒基因中,未发现变体X码。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验