Department of Computer Science and Engineering, Tezpur University, Tezpur, Assam 784 028, India.
Department of Statistics, Darrang College, Tezpur, Assam 784001, India.
Gene. 2014 Feb 15;536(1):18-28. doi: 10.1016/j.gene.2013.11.098. Epub 2013 Dec 11.
It has been reported earlier that the relative di-nucleotide frequency (RDF) in different parts of a genome is similar while the frequency is variable among different genomes. So RDF is termed as genome signature in bacteria. It is not known if the constancy in RDF is governed by genome wide mutational bias or by selection. Here we did comparative analysis of RDF between the inter-genic and the coding sequences in seventeen bacterial genomes, whose gene expression data was available. The constraint on di-nucleotides was found to be higher in the coding sequences than that in the inter-genic regions and the constraint at the 2nd codon position was more than that in the 3rd position within a genome. Further analysis revealed that the constraint on di-nucleotides at the 2nd codon position is greater in the high expression genes (HEG) than that in the whole genomes as well as in the low expression genes (LEG). We analyzed RDF at the 2nd and the 3rd codon positions in simulated coding sequences that were computationally generated by keeping the codon usage bias (CUB) according to genome G+C composition and the sequence of amino acids unaltered. In the simulated coding sequences, the constraint observed was significantly low and no significant difference was observed between the HEG and the LEG in terms of di-nucleotide constraint. This indicated that the greater constraint on di-nucleotides in the HEG was due to the stronger selection on CUB in these genes in comparison to the LEG within a genome. Further, we did comparative analyses of the RDF in the HEG rpoB and rpoC of 199 bacteria, which revealed a common pattern of constraints on di-nucleotides at the 2nd codon position across these bacteria. To validate the role of CUB on di-nucleotide constraint, we analyzed RDF at the 2nd and the 3rd codon positions in simulated rpoB/rpoC sequences. The analysis revealed that selection on CUB is an important attribute for the constraint on di-nucleotides at these positions in bacterial genomes. We believe that this study has come with major findings of the role of CUB on di-nucleotide constraint in bacterial genomes.
先前有报道称,基因组不同部位的相对二核苷酸频率(RDF)相似,而不同基因组之间的频率则不同。因此,RDF 被称为细菌的基因组特征。目前尚不清楚 RDF 的稳定性是由全基因组突变偏向还是由选择决定的。在这里,我们对十七个细菌基因组的基因间序列和编码序列之间的 RDF 进行了比较分析,这些基因组的基因表达数据是可用的。结果发现,编码序列中二核苷酸的约束比基因间区高,基因组内第 2 位密码子的约束大于第 3 位密码子的约束。进一步分析表明,在高表达基因(HEG)中,第 2 位密码子的二核苷酸约束大于整个基因组以及低表达基因(LEG)中的约束。我们分析了在根据基因组 G+C 组成和氨基酸序列保持不变的情况下通过计算生成的模拟编码序列中的第 2 位和第 3 位密码子位置的 RDF。在模拟编码序列中,观察到的约束明显较低,并且在 HEG 和 LEG 之间,二核苷酸约束没有显著差异。这表明,在 HEG 中,二核苷酸的约束更大,是由于与 LEG 相比,这些基因中 CUB 的选择更强。此外,我们对 199 种细菌的 HEG rpoB 和 rpoC 中的 RDF 进行了比较分析,结果表明,在这些细菌中,第 2 位密码子位置的二核苷酸约束存在共同的模式。为了验证 CUB 对二核苷酸约束的作用,我们分析了模拟 rpoB/rpoC 序列中第 2 位和第 3 位密码子位置的 RDF。分析表明,CUB 上的选择是细菌基因组中这些位置上二核苷酸约束的一个重要属性。我们相信,这项研究对 CUB 在细菌基因组中二核苷酸约束中的作用有了重大发现。