McClelland M, Ivarie R
Nucleic Acids Res. 1982 Dec 11;10(23):7865-77. doi: 10.1093/nar/10.23.7865.
The frequency and distribution of the rare dinucleotide CpG was examined in 15 mammalian genes. CpG is highly methylated at cytosine in mammalian DNA (1,2) and 5-methylcytosine (5mC) is thought to undergo a transition mutation via deamination to produce thymine (3). This would result in the accumulation of TpG and CpA and depletion of CpG during evolution (4). Consistent with this hypothesis, the gene sample of 26,541 dinucleotides contained CpG at 40% the frequency expected by base composition and the CpG transition products, TpG+CpA, were significantly elevated at 124% of expected random frequency. However, because CpG occurs at only 25% of expected random frequency in the genome, the sampled genes were considerably enriched in this dinucleotide. CpGs were asymmetrically distributed in sequences flanking the genes. 5'-flanking sequences were enriched in CpG at 135% of the frequency expected assuming a symmetrical distribution of all the CpGs in the sampled genes (p less than 0.01), while 3'-flanking regions were depleted in CpG at 40% of expected values (p less than 0.0001). This asymmetry may reflect the role of 5-methylcytosine in gene expression. In contrast the frequencies of GpC and GpT+ ApC did not differ significantly from that predicted by base composition and these dinucleotides were not asymmetrically distributed.
在15个哺乳动物基因中检测了稀有二核苷酸CpG的频率和分布。在哺乳动物DNA中,CpG在胞嘧啶处高度甲基化(1,2),并且5-甲基胞嘧啶(5mC)被认为会通过脱氨作用发生转换突变以产生胸腺嘧啶(3)。这将导致在进化过程中TpG和CpA的积累以及CpG的减少(4)。与该假设一致,26541个二核苷酸的基因样本中CpG的频率为碱基组成预期频率的40%,而CpG转换产物TpG + CpA显著升高,达到预期随机频率的124%。然而,由于CpG在基因组中的出现频率仅为预期随机频率的25%,所采样的基因中这种二核苷酸显著富集。CpG在基因侧翼序列中不对称分布。5'侧翼序列中CpG富集,其频率为假设所采样基因中所有CpG对称分布时预期频率的135%(p小于0.01),而3'侧翼区域中CpG减少,为预期值的40%(p小于0.0001)。这种不对称性可能反映了5-甲基胞嘧啶在基因表达中的作用。相比之下,GpC以及GpT + ApC的频率与碱基组成预测的频率没有显著差异,并且这些二核苷酸没有不对称分布。