Zhang Lingang, Kasif Simon, Cantor Charles R, Broude Natalia E
Center for Advanced Biotechnology, Boston University, Boston, MA 02215, USA.
Proc Natl Acad Sci U S A. 2004 Nov 30;101(48):16855-60. doi: 10.1073/pnas.0407821101. Epub 2004 Nov 17.
Large-scale analysis of the GC-content distribution at the gene level reveals both common features and basic differences in genomes of different groups of species. Sharp changes in GC content are detected at the transcription boundaries for all species analyzed, including human, mouse, rat, chicken, fruit fly, and worm. However, two substantially distinct groups of GC-content profiles can be recognized: warm-blooded vertebrates including human, mouse, rat, and chicken, and invertebrates including fruit fly and worm. In vertebrates, sharp positive and negative spikes of GC content are observed at the transcription start and stop sites, respectively, and there is also a progressive decrease in GC content from the 5' untranslated region to the 3' untranslated region along the gene. In invertebrates, the positive and negative GC-content spikes at the transcription start and stop sites are preceded by spikes of opposite value, and the highest GC content is found in the coding regions of the genes. Cross-correlation analysis indicates high frequencies of GC-content spikes at transcription start and stop sites. The strong conservation of this genomic feature seen in comparisons of the human/mouse and human/rat orthologs, and the clustering of genes with GC-content spikes on chromosomes imply a biological function. The GC-content spikes at transcription boundaries may reflect a general principle of genomic punctuation. Our analysis also provides means for identifying these GC-content spikes in individual genomic sequences.
在基因水平上对GC含量分布进行大规模分析,揭示了不同物种基因组中的共同特征和基本差异。在所有分析的物种(包括人类、小鼠、大鼠、鸡、果蝇和线虫)的转录边界处都检测到了GC含量的急剧变化。然而,可以识别出两组截然不同的GC含量谱:包括人类、小鼠、大鼠和鸡在内的温血脊椎动物,以及包括果蝇和线虫在内的无脊椎动物。在脊椎动物中,分别在转录起始和终止位点观察到GC含量的急剧正峰和负峰,并且沿着基因从5'非翻译区到3'非翻译区GC含量也逐渐降低。在无脊椎动物中,转录起始和终止位点的正、负GC含量峰之前是相反值的峰,并且在基因的编码区发现最高的GC含量。互相关分析表明在转录起始和终止位点GC含量峰的频率很高。在人类/小鼠和人类/大鼠直系同源物的比较中看到的这种基因组特征的强烈保守性,以及具有GC含量峰的基因在染色体上的聚类意味着一种生物学功能。转录边界处的GC含量峰可能反映了基因组标点的一般原则。我们的分析还提供了在个体基因组序列中识别这些GC含量峰的方法。