Chen Xiang-Gui, Hu Jun, Yang Xiao
School of Bioengineering, Xihua University, Chengdu 610039, China.
Yi Chuan. 2008 Sep;30(9):1169-74. doi: 10.3724/sp.j.1005.2008.01169.
GC level is an important feature of genomic composition, which significantly improve our understanding of structure, function and evolution of genes. In this paper, the nonredundant DNA sequence of 7,992 human protein coding genes were retrieved from public database and the local GC level of different sequence regions and correlation between GC levels were analyzed.. The results showed that the GC levels of different sequence regions were strikingly nonuniform. 5' untranslated regions were of richest GC, with average GC content being 62.5%. 3'-untranslated regions were of poorest GC, with average GC content being 43.97%. GC contents of 3' flanking sequences profoundly matched the GC levels of DNA large fragments where the genes were located. Although the GC contents of open reading frames (ORFs) were higher than that of intron, 3' non-translated region and 3' flanking sequences, high correlation existed among the GC contents of the four regions. Average GC content of the third codon position (GC3) was 58.9%, higher than that of the fist and second position, and showed high correlation to GC contents of ORFs, with correlation coefficients being 0.91, besides of its significant association with GC contents of intron, 3'-untranslated region and 3' flanking sequences. Moreover, the linear regression of GC3 against GC contents of 3' flanking sequences yielded a slope of 1.25. Thus, GC3 was a sensitive indicator for GC change of local genome. As for 5' flanking sequences, 5' untranslated regions, fist and second codon position, however, their GC level exhibited weaker correlation with that of other regions. These results suggest that the third codon positions, introns, 3'-untranslated regions and 3' flanking sequences may evolve similarly while first and second codon positions, 5' flanking sequences and 5' untranslated region were expected to bear more selective stress for holding their functions.
GC含量是基因组组成的一个重要特征,它显著增进了我们对基因的结构、功能及进化的理解。本文从公共数据库中检索了7992个人类蛋白质编码基因的非冗余DNA序列,并分析了不同序列区域的局部GC含量以及GC含量之间的相关性。结果表明,不同序列区域的GC含量极不均匀。5'非翻译区的GC含量最丰富,平均GC含量为62.5%。3'非翻译区的GC含量最贫乏,平均GC含量为43.97%。3'侧翼序列的GC含量与基因所在DNA大片段的GC含量高度匹配。虽然开放阅读框(ORF)的GC含量高于内含子、3'非翻译区和3'侧翼序列,但这四个区域的GC含量之间存在高度相关性。第三个密码子位置(GC3)的平均GC含量为58.9%,高于第一个和第二个密码子位置,并且与ORF的GC含量高度相关,相关系数为0.91,此外它还与内含子、3'非翻译区和3'侧翼序列的GC含量显著相关。此外,GC3与3'侧翼序列的GC含量的线性回归斜率为1.25。因此,GC3是局部基因组GC变化的一个敏感指标。然而,对于5'侧翼序列、5'非翻译区、第一个和第二个密码子位置,它们的GC水平与其他区域的相关性较弱。这些结果表明,第三个密码子位置、内含子、3'非翻译区和3'侧翼序列可能以相似的方式进化,而第一个和第二个密码子位置、5'侧翼序列和5'非翻译区为维持其功能可能承受更多的选择压力。