Elango Navin, Kim Seong-Ho, Vigoda Eric, Yi Soojin V
School of Biology, Georgia Institute of Technology, Atlanta, Georgia, USA.
PLoS Comput Biol. 2008 Feb 29;4(2):e1000015. doi: 10.1371/journal.pcbi.1000015.
Transitions at CpG dinucleotides, referred to as "CpG substitutions", are a major mutational input into vertebrate genomes and a leading cause of human genetic disease. The prevalence of CpG substitutions is due to their mutational origin, which is dependent on DNA methylation. In comparison, other single nucleotide substitutions (for example those occurring at GpC dinucleotides) mainly arise from errors during DNA replication. Here we analyzed high quality BAC-based data from human, chimpanzee, and baboon to investigate regional variation of CpG substitution rates. We show that CpG substitutions occur approximately 15 times more frequently than other single nucleotide substitutions in primate genomes, and that they exhibit substantial regional variation. Patterns of CpG rate variation are consistent with differences in methylation level and susceptibility to subsequent deamination. In particular, we propose a "distance-decaying" hypothesis, positing that due to the molecular mechanism of a CpG substitution, rates are correlated with the stability of double-stranded DNA surrounding each CpG dinucleotide, and the effect of local DNA stability may decrease with distance from the CpG dinucleotide.Consistent with our "distance-decaying" hypothesis, rates of CpG substitution are strongly (negatively) correlated with regional G+C content. The influence of G+C content decays as the distance from the target CpG site increases. We estimate that the influence of local G+C content extends up to 1,500 approximately 2,000 bps centered on each CpG site. We also show that the distance-decaying relationship persisted when we controlled for the effect of long-range homogeneity of nucleotide composition. GpC sites, in contrast, do not exhibit such "distance-decaying" relationship. Our results highlight an example of the distinctive properties of methylation-dependent substitutions versus substitutions mostly arising from errors during DNA replication. Furthermore, the negative relationship between G+C content and CpG rates may provide an explanation for the observation that GC-rich SINEs show lower CpG rates than other repetitive elements.
CpG二核苷酸处的转变,即所谓的“CpG替换”,是脊椎动物基因组主要的突变输入,也是人类遗传疾病的主要原因。CpG替换的普遍性归因于其突变起源,这依赖于DNA甲基化。相比之下,其他单核苷酸替换(例如发生在GpC二核苷酸处的替换)主要源于DNA复制过程中的错误。在此,我们分析了来自人类、黑猩猩和狒狒的高质量BAC数据,以研究CpG替换率的区域差异。我们发现,在灵长类基因组中,CpG替换的发生频率比其他单核苷酸替换高出约15倍,并且它们表现出显著的区域差异。CpG率变化模式与甲基化水平和后续脱氨敏感性的差异一致。特别是,我们提出了一个“距离衰减”假说,假定由于CpG替换的分子机制,其发生率与每个CpG二核苷酸周围双链DNA的稳定性相关,并且局部DNA稳定性的影响可能随着与CpG二核苷酸距离的增加而降低。与我们的“距离衰减”假说一致,CpG替换率与区域G+C含量呈强烈(负)相关。G+C含量的影响随着与目标CpG位点距离的增加而衰减。我们估计,局部G+C含量的影响以每个CpG位点为中心延伸至约1500至2000个碱基对。我们还表明,当我们控制核苷酸组成的长程同质性影响时,距离衰减关系仍然存在。相比之下,GpC位点没有表现出这种“距离衰减”关系。我们的结果突出了甲基化依赖性替换与主要源于DNA复制错误的替换之间独特性质的一个例子。此外,G+C含量与CpG率之间的负相关关系可能为富含GC的SINEs比其他重复元件显示出更低的CpG率这一观察结果提供解释。