Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa 277-0882, Japan.
Genome Res. 2012 Aug;22(8):1419-25. doi: 10.1101/gr.140236.112. Epub 2012 Jun 11.
5-methyl-cytosines at CpG sites frequently mutate into thymines, accounting for a large proportion of spontaneous point mutations. The repair system would leave substantial numbers of errors in neighboring regions if the synthesis of erased gaps around deaminated 5-methyl-cytosines is error-prone. Indeed, we identified an unexpected genome-wide role of the CpG methylation state as a major determinant of proximal natural genetic variation. Specifically, 507 Mbp (∼18%) of the human genome was within 10 bp of a CpG site; in these regions, the single nucleotide polymorphism (SNP) rate significantly increased by ∼50% (P < 10(-566) by a two-proportion z-test) if the neighboring CpG sites are methylated. To reconfirm this finding in another vertebrate, we compared six single-base resolution methylomes in two inbred medaka (Oryzias latipes) strains with sufficient genetic divergence (3.4%). We found that the SNP rate also increased by ∼50% (P < 10(-2170)), and the substitution rates in all dinucleotides increased simultaneously (P < 10(-441)) around methylated CpG sites. In the hypomethylated regions, the "CGCG" motif was significantly enriched (P < 10(-680)) and evolutionarily conserved (P = ∼ 0.203%), and slow CpG deamination rather than fast CpG gain was seen, indicating a possible role of CGCG as a candidate cis-element for the hypomethylation state. In regions that were hypermethylated in germline-like tissues but were hypomethylated in somatic liver cells, the SNP rate was significantly smaller than that in hypomethylated regions in both tissues, suggesting a positive selective pressure during DNA methylation reprogramming. This is the first report of findings showing that the CpG methylation state is significantly correlated with the characteristics of evolutionary change in neighboring DNA.
CpG 位点上的 5-甲基胞嘧啶经常突变为胸腺嘧啶,这占自发点突变的很大比例。如果脱氨基 5-甲基胞嘧啶周围的擦除缺口的合成容易出错,那么修复系统会在相邻区域留下大量错误。事实上,我们发现 CpG 甲基化状态作为近端自然遗传变异的主要决定因素,具有出乎意料的全基因组作用。具体来说,如果相邻的 CpG 位点被甲基化,人类基因组的 507 Mbp(约 18%)在 10 bp 内位于 CpG 位点;在这些区域中,单核苷酸多态性(SNP)率显著增加了约 50%(通过两比例 z 检验,P < 10(-566))。为了在另一种脊椎动物中重新确认这一发现,我们比较了两个近交系(Oryzias latipes)中六个单碱基分辨率的甲基组,这两个系具有足够的遗传分化(3.4%)。我们发现 SNP 率也增加了约 50%(P < 10(-2170)),并且在甲基化 CpG 位点周围所有二核苷酸的替换率同时增加(P < 10(-441))。在低甲基化区域中,“CGCG”基序显著富集(P < 10(-680))并且进化保守(P = ∼ 0.203%),并且观察到 CGCG 的慢速 CpG 脱氨而不是快速 CpG 获得,这表明 CGCG 可能作为低甲基化状态的顺式元件发挥作用。在生殖细胞样组织中高度甲基化但在体细胞肝细胞中低甲基化的区域中,SNP 率明显小于两个组织中低甲基化区域的 SNP 率,这表明在 DNA 甲基化重编程过程中存在正向选择压力。这是第一个报道表明 CpG 甲基化状态与邻近 DNA 进化变化特征显著相关的发现。