Zaghloul Lamia, Drillon Guénola, Boulos Rasha E, Argoul Françoise, Thermes Claude, Arneodo Alain, Audit Benjamin
Université de Lyon, F-69000 Lyon, France; Laboratoire de Physique, CNRS UMR 5672, Ecole Normale Supérieure de Lyon, F-69007 Lyon, France.
Centre de Génétique Moléculaire, CNRS UPR 3404, Gif-sur-Yvette, France.
Comput Biol Chem. 2014 Dec;53 Pt A:153-65. doi: 10.1016/j.compbiolchem.2014.08.020. Epub 2014 Aug 27.
Besides their large-scale organization in isochores, mammalian genomes display megabase-sized regions, spanning both genes and intergenes, where the strand nucleotide composition asymmetry decreases linearly, possibly due to replication activity. These so-called skew-N domains cover about a third of the human genome and are bordered by two skew upward jumps that were hypothesized to compose a subset of "master" replication origins active in the germline. Skew-N domains were shown to exhibit a particular gene organization. Genes with CpG-rich promoters likely expressed in the germline are over represented near the master replication origins, with large genes being co-oriented with replication fork progression, which suggests some coordination of replication and transcription. In this study, we describe another skew structure that covers ∼13% of the human genome and that is bordered by putative master replication origins similar to the ones flanking skew-N domains. These skew-split-N domains have a shape reminiscent of a N, but split in half, leaving in the center a region of null skew whose length increases with domain size. These central regions (median size ∼860 kb) have a homogeneous composition, i.e. both a null and constant skew and a constant and low GC content. They correspond to heterochromatin gene deserts found in low-GC isochores with an average gene density of 0.81 promoters/Mb as compared to 7.73 promoters/Mb genome wide. The analysis of epigenetic marks and replication timing data confirms that, in these late replicating heterochomatic regions, the initiation of replication is likely to be random. This contrasts with the transcriptionally active euchromatin state found around the bordering well positioned master replication origins. Altogether skew-N domains and skew-split-N domains cover about 50% of the human genome.
除了在等密度区的大规模组织外,哺乳动物基因组还显示出跨越基因和基因间区域的兆碱基大小的区域,其中链核苷酸组成不对称性呈线性下降,这可能是由于复制活性所致。这些所谓的偏斜N结构域覆盖了大约三分之一的人类基因组,并由两个向上的偏斜跳跃所界定,据推测这两个跳跃构成了在生殖系中活跃的“主”复制起点的一个子集。研究表明,偏斜N结构域具有特定的基因组织。可能在生殖系中表达的富含CpG启动子的基因在主复制起点附近过度富集,大基因与复制叉的前进方向同向排列,这表明复制和转录之间存在一定的协调性。在本研究中,我们描述了另一种偏斜结构,它覆盖了约13%的人类基因组,并由与偏斜N结构域侧翼相似的假定主复制起点所界定。这些偏斜分裂N结构域的形状让人联想到一个N,但被一分为二,在中心留下一个零偏斜区域,其长度随结构域大小增加。这些中心区域(中位大小约860 kb)具有均匀的组成,即零偏斜且恒定、GC含量恒定且较低。它们对应于低GC等密度区中的异染色质基因荒漠,平均基因密度为0.81个启动子/Mb,而全基因组的平均基因密度为7.73个启动子/Mb。对表观遗传标记和复制时间数据的分析证实,在这些晚期复制的异染色质区域,复制起始可能是随机的。这与在边界位置良好的主复制起点周围发现的转录活跃的常染色质状态形成对比。总之,偏斜N结构域和偏斜分裂N结构域覆盖了大约50%的人类基因组。