Duret L, Galtier N
Laboratoire de Biométrie, Génétique et Biologie des Populations, Université Claude Bernard, Villeurbanne, France.
Mol Biol Evol. 2000 Nov;17(11):1620-5. doi: 10.1093/oxfordjournals.molbev.a026261.
CpG and TpA dinucleotides are underrepresented in the human genome. The CpG deficiency is due to the high mutation rate from C to T in methylated CpG's. The TpA suppression was thought to reflect a counterselection against TpA's destabilizing effect in RNA. Unexpectedly, the TpA and CpG deficiencies vary according to the G+C contents of sequences. It has been proposed that the variation in CpG suppression was correlated with a particular chromatin organization in G+C-rich isochores. Here, we present an improved model of dinucleotide evolution accounting for the overlap between successive dinucleotides. We show that an increased mutation rate from CpG to TpG or CpA induces both an apparent TpA deficiency and a correlation between CpG and TpA deficiencies and G+C content. Moreover, this model shows that the ratio of observed over expected CpG frequency underestimates the real CpG deficiency in G+C-rich sequences. The predictions of our model fit well with observed frequencies in human genomic data. This study suggests that previously published selectionist interpretations of patterns of dinucleotide frequencies should be taken with caution. Moreover, we propose new criteria to identify unmethylated CpG islands taking into account this bias in the measure of CpG depletion.
CpG和TpA二核苷酸在人类基因组中含量较低。CpG缺乏是由于甲基化的CpG中从C到T的高突变率所致。TpA抑制被认为反映了对TpA在RNA中不稳定作用的反向选择。出乎意料的是,TpA和CpG缺乏根据序列的G+C含量而变化。有人提出,CpG抑制的变化与富含G+C的等密度区中特定的染色质组织相关。在这里,我们提出了一个改进的二核苷酸进化模型,该模型考虑了连续二核苷酸之间的重叠。我们表明,从CpG到TpG或CpA的突变率增加既会导致明显的TpA缺乏,也会导致CpG和TpA缺乏与G+C含量之间的相关性。此外,该模型表明,观察到的与预期的CpG频率之比低估了富含G+C序列中实际的CpG缺乏。我们模型的预测与人类基因组数据中观察到的频率非常吻合。这项研究表明,以前发表的关于二核苷酸频率模式的选择主义解释应谨慎对待。此外,我们提出了新的标准来识别未甲基化的CpG岛,同时考虑到CpG消耗测量中的这种偏差。