Wen Sheng-Yun, Zhang Chun-Ting
Department of Physics, Tianjin University, PR China.
Biochem Biophys Res Commun. 2003 Nov 7;311(1):215-22. doi: 10.1016/j.bbrc.2003.09.198.
Incorporated with the Z curve method, the technique of wavelet multiresolution (also known as multiscale) analysis has been proposed to identify the boundaries of isochores in the human genome. The human MHC sequence and the longest contigs of human chromosomes 21 and 22 are used as examples. The boundary between the isochores of Class III and Class II in the MHC sequence has been detected and found to be situated at the position 2,490,368bp. This result is in good agreement with the experimental evidence. An isochore with a length of about 7Mb in chromosome 21 has been identified and found to be gene- and Alu-poor. We have also found that the G+C content of chromosome 21 is more homogeneous than that of chromosome 22. Compared with the window-based methods, the present method has the highest resolution for identifying the boundaries of isochores, even at a scale of single base. Compared with the entropic segmentation method, the present method has the merits of more intuitiveness and less calculations. The important conclusion drawn in this study is that the segmentation points, at which the G+C content undergoes relatively dramatic changes, do exist in the human genome. These 'singularity' points may be considered to be candidates of isochore boundaries in the human genome. The method presented is a general one and can be used to analyze any other genomes.
结合Z曲线方法,提出了小波多分辨率(也称为多尺度)分析技术来识别人类基因组中同线区的边界。以人类MHC序列以及人类21号和22号染色体的最长重叠群为例。已检测到MHC序列中III类和II类同线区之间的边界,发现其位于2,490,368bp处。这一结果与实验证据高度吻合。在21号染色体中识别出了一个长度约为7Mb的同线区,发现其基因和Alu序列较少。我们还发现,21号染色体的G+C含量比22号染色体更均匀。与基于窗口的方法相比,本方法在识别同线区边界方面具有最高分辨率,甚至在单碱基尺度上也是如此。与熵分割方法相比,本方法具有更直观、计算量更小的优点。本研究得出的重要结论是,在人类基因组中确实存在G+C含量发生相对剧烈变化的分割点。这些“奇异”点可被视为人类基因组中同线区边界的候选点。所提出的方法是一种通用方法,可用于分析任何其他基因组。