Chen Ling-Ling, Gao Feng
Laboratory for Computational Biology, Shandong Provincial Research Center for Bioinformatic Engineering and Techniques, Shandong University of Technology, Zibo, China.
FEBS J. 2005 Jul;272(13):3328-36. doi: 10.1111/j.1742-4658.2005.04748.x.
Eukaryotic genomes are composed of isochores, i.e. long sequences relatively homogeneous in GC content. In this paper, the isochore structure of Arabidopsis thaliana genome has been studied using a windowless technique based on the Z curve method and intuitive curves are drawn for all the five chromosomes. Using these curves, we can calculate the GC content at any resolution, even at the base level. It is observed that all the five chromosomes are composed of several GC-rich and AT-rich regions alternatively. Usually, these regions, named 'isochore-like regions', have large fluctuations in the GC content. Five isochores with little fluctuations are also observed. Detailed analyses have been performed for these isochores. A GC-rich 'isochore-like region' and a GC-isochore in chromosome II and IV, respectively, are the nucleolar organizer regions (NORs), and genes located in the two regions prefer to use GC-ending codons. Another GC-isochore located in chromosome II is a mitochondrial DNA insertion region, the position and size of this region is precisely predicted by the current method. The amino acid usage and codon preference of genes in this organellar-to-nuclear transfer region show significant difference from other regions. Moreover, the centromeres are located in GC-rich 'isochore-like regions' in all the five chromosomes. The current method can provide a useful tool for analyzing whole genomic sequences of eukaryotes.
真核生物基因组由等密度区组成,即GC含量相对均匀的长序列。本文采用基于Z曲线法的无窗口技术研究了拟南芥基因组的等密度区结构,并绘制了所有五条染色体的直观曲线。利用这些曲线,我们可以在任何分辨率下计算GC含量,甚至在碱基水平。观察到所有五条染色体均由几个富含GC和富含AT的区域交替组成。通常,这些区域被称为“类等密度区”,其GC含量波动较大。还观察到五个波动较小的等密度区。对这些等密度区进行了详细分析。分别位于染色体II和IV中的一个富含GC的“类等密度区”和一个GC等密度区是核仁组织区(NORs),位于这两个区域的基因倾向于使用以GC结尾的密码子。位于染色体II中的另一个GC等密度区是线粒体DNA插入区,该区域的位置和大小可通过当前方法精确预测。该细胞器到细胞核转移区域中基因的氨基酸使用和密码子偏好与其他区域存在显著差异。此外,所有五条染色体的着丝粒都位于富含GC的“类等密度区”。当前方法可为分析真核生物的全基因组序列提供有用工具。