Korotkov Eugene, Zaytsev Konstantin, Fedorov Alexey
Institute of Bioengineering, Federal Research Center of Biotechnology of the Russian Academy of Sciences, 119071 Moscow, Russia.
Bach Institute of Biochemistry, Research Center of Biotechnology of the Russian Academy of Sciences, 119071 Moscow, Russia.
Entropy (Basel). 2022 Apr 30;24(5):632. doi: 10.3390/e24050632.
In this paper, we attempted to find a relation between bacteria living conditions and their genome algorithmic complexity. We developed a probabilistic mathematical method for the evaluation of k-words (6 bases length) occurrence irregularity in bacterial gene coding sequences. For this, the coding sequences from different bacterial genomes were analyzed and as an index of k-words occurrence irregularity, we used W, which has a distribution similar to normal. The research results for bacterial genomes show that they can be divided into two uneven groups. First, the smaller one has in the interval from 170 to 475, while for the second it is from 475 to 875. Plants, metazoan and virus genomes also have in the same interval as the first bacterial group. We suggested that second bacterial group coding sequences are much less susceptible to evolutionary changes than the first group ones. It is also discussed to use the index as a biological stress value.
在本文中,我们试图找出细菌生存条件与其基因组算法复杂性之间的关系。我们开发了一种概率数学方法,用于评估细菌基因编码序列中k字(6个碱基长度)出现的不规则性。为此,我们分析了来自不同细菌基因组的编码序列,并使用W作为k字出现不规则性的指标,W具有类似于正态分布的分布。细菌基因组的研究结果表明,它们可以分为两个不均衡的组。第一组较小,W值在170至475之间,而第二组的W值在475至875之间。植物、后生动物和病毒基因组的W值也与第一组细菌处于相同区间。我们认为,第二组细菌的编码序列比第一组细菌的编码序列对进化变化的敏感性要低得多。我们还讨论了将W指标用作生物应激值。