Oliver José L, Carpena Pedro, Román-Roldán Ramón, Mata-Balaguer Trinidad, Mejías-Romero Andrés, Hackenberg Michael, Bernaola-Galván Pedro
Departamento de Genética, Instituto de Biotecnología, Universidad de Granada, Granada, Spain.
Gene. 2002 Oct 30;300(1-2):117-27. doi: 10.1016/s0378-1119(02)01034-x.
The human genome is a mosaic of isochores, which are long DNA segments (z.Gt;300 kbp) relatively homogeneous in G+C. Human isochores were first identified by density-gradient ultracentrifugation of bulk DNA, and differ in important features, e.g. genes are found predominantly in the GC-richest isochores. Here, we use a reliable segmentation method to partition the longest contigs in the human genome draft sequence into long homogeneous genome regions (LHGRs), thereby revealing the isochore structure of the human genome. The advantages of the isochore maps presented here are: (1) sequence heterogeneities at different scales are shown in the same plot; (2) pair-wise compositional differences between adjacent regions are all statistically significant; (3) isochore boundaries are accurately defined to single base pair resolution; and (4) both gradual and abrupt isochore boundaries are simultaneously revealed. Taking advantage of the wide sample of genome sequence analyzed, we investigate the correspondence between LHGRs and true human isochores revealed through DNA centrifugation. LHGRs show many of the typical isochore features, mainly size distribution, G+C range, and proportions of the isochore classes. The relative density of genes, Alu and long interspersed nuclear element repeats and the different types of single nucleotide polymorphisms on LHGRs also coincide with expectations in true isochores. Potential applications of isochore maps range from the improvement of gene-finding algorithms to the prediction of linkage disequilibrium levels in association studies between marker genes and complex traits. The coordinates for the LHGRs identified in all the contigs longer than 2 Mb in the human genome sequence are available at the online resource on isochore mapping: http://bioinfo2.ugr.es/isochores.
人类基因组是由等密度区带组成的镶嵌体,等密度区带是指在G+C含量上相对均匀的长DNA片段(例如,大于300kbp)。人类等密度区带最初是通过对大量DNA进行密度梯度超速离心鉴定出来的,它们在重要特征上存在差异,例如基因主要存在于G+C含量最高的等密度区带中。在这里,我们使用一种可靠的分割方法,将人类基因组草图序列中最长的重叠群划分为长的均匀基因组区域(LHGRs),从而揭示人类基因组的等密度区带结构。本文所呈现的等密度区带图谱具有以下优点:(1)在同一图中展示了不同尺度下的序列异质性;(2)相邻区域之间的成对组成差异均具有统计学显著性;(3)等密度区带边界被精确界定到单碱基对分辨率;(4)同时揭示了渐变和突变的等密度区带边界。利用所分析的广泛基因组序列样本,我们研究了LHGRs与通过DNA离心揭示的真实人类等密度区带之间的对应关系。LHGRs呈现出许多典型的等密度区带特征,主要是大小分布、G+C范围以及等密度区带类别的比例。LHGRs上基因、Alu和长散在核元件重复序列的相对密度以及不同类型的单核苷酸多态性也与真实等密度区带中的预期相符。等密度区带图谱的潜在应用范围从改进基因寻找算法到预测标记基因与复杂性状关联研究中的连锁不平衡水平。在人类基因组序列中所有长度超过2Mb的重叠群中鉴定出的LHGRs的坐标可在等密度区带图谱的在线资源中获取:http://bioinfo2.ugr.es/isochores 。