Zhang Chun-Ting, Zhang Ren
Department of Physics, Tianjin University, Nankai District, Tianjin 300072, China.
Gene. 2003 Oct 23;317(1-2):127-35. doi: 10.1016/s0378-1119(03)00665-6.
The distribution of the G+C content in the human genome has been studied by using a windowless technique derived from the Z curve method. The most important findings presented in this paper are twofold. First, abrupt variations of the G+C content along human chromosome sequences are the main variation patterns of G+C content. It is found that at some sites, the G+C content undergoes abrupt changes from a G+C-rich region to a G+C-poor region alternatively and vice versa. Second, it is shown that long domains with relatively homogeneous G+C content along each chromosome do exist. These domains are thought to be isochores, which usually have sharp boundaries. Consequently, 56 isochores longer than 3 Mb have been identified in chromosomes 1-22, X and Y. Boundaries, size and G+C content of each isochore identified are listed in detail. As an example to demonstrate the power of the method, the boundary between the Classes III and II isochores of the MHC sequence has been determined and found to be at 2,477,936, which is in good agreement with the experimental evidence. A homogeneity index is introduced to measure the homogeneity of G+C content in isochores. We emphasize that the homogeneity of G+C content is relative. The isochores in which the G+C content keeps absolutely constant do not exist. Isochore structures appear to be a basic organization of the human genome. Due to the relevance to many important biological functions, the clarification of isochore structures will provide much insight into the understanding of the human genome.
利用源自Z曲线法的无窗口技术,对人类基因组中G+C含量的分布进行了研究。本文呈现的最重要发现有两个方面。首先,沿人类染色体序列的G+C含量的突然变化是G+C含量的主要变化模式。发现在某些位点,G+C含量从富含G+C的区域交替突变为富含G+C的区域,反之亦然。其次,研究表明每条染色体上确实存在G+C含量相对均匀的长区域。这些区域被认为是等密度区,通常具有清晰的边界。因此,在1 - 22号染色体、X染色体和Y染色体中已鉴定出56个长度超过3 Mb的等密度区。详细列出了所鉴定的每个等密度区的边界、大小和G+C含量。作为证明该方法效力的一个例子,已确定MHC序列的III类和II类等密度区之间的边界位于2,477,936处,这与实验证据高度吻合。引入了一个均匀性指数来衡量等密度区中G+C含量的均匀性。我们强调G+C含量的均匀性是相对的。不存在G+C含量绝对恒定的等密度区。等密度区结构似乎是人类基因组的一种基本组织形式。由于与许多重要生物学功能相关,对等密度区结构的阐明将为理解人类基因组提供很多见解。