Saxonov Serge, Berg Paul, Brutlag Douglas L
BioMedical Informatics Program, Stanford University, Stanford, CA 94305, USA.
Proc Natl Acad Sci U S A. 2006 Jan 31;103(5):1412-7. doi: 10.1073/pnas.0510310103. Epub 2006 Jan 23.
A striking feature of the human genome is the dearth of CpG dinucleotides (CpGs) interrupted occasionally by CpG islands (CGIs), regions with relatively high content of the dinucleotide. CGIs are generally associated with promoters; genes, whose promoters are especially rich in CpG sequences, tend to be expressed in most tissues. However, all working definitions of what constitutes a CGI rely on ad hoc thresholds. Here we adopt a direct and comprehensive survey to identify the locations of all CpGs in the human genome and find that promoters segregate naturally into two classes by CpG content. Seventy-two percent of promoters belong to the class with high CpG content (HCG), and 28% are in the class whose CpG content is characteristic of the overall genome (low CpG content). The enrichment of CpGs in the HCG class is symmetric and peaks around the core promoter. The broad-based expression of the HCG promoters is not a consequence of a correlation with CpG content because within the HCG class the breadth of expression is independent of the CpG content. The overall depletion of CpGs throughout the genome is thought to be a consequence of the methylation of some germ-line CpGs and their susceptibility to mutation. A comparison of the frequencies of inferred deamination mutations at CpG and GpC dinucleotides in the two classes of promoters using SNPs in human-chimpanzee sequence alignments shows that CpGs mutate at a lower frequency in the HCG promoters, suggesting that CpGs in the HCG class are hypomethylated in the germ line.
人类基因组的一个显著特征是CpG二核苷酸(CpGs)的缺乏,偶尔会被CpG岛(CGIs)打断,CpG岛是二核苷酸含量相对较高的区域。CpG岛通常与启动子相关;其启动子富含CpG序列的基因往往在大多数组织中表达。然而,关于什么构成CpG岛的所有现行定义都依赖于临时设定的阈值。在这里,我们采用了直接而全面的调查来确定人类基因组中所有CpG的位置,发现启动子根据CpG含量自然地分为两类。72%的启动子属于高CpG含量(HCG)类,28%属于CpG含量具有全基因组特征(低CpG含量)的类。HCG类中CpG的富集是对称的,并且在核心启动子周围达到峰值。HCG启动子广泛表达并非与CpG含量相关的结果,因为在HCG类中,表达广度与CpG含量无关。整个基因组中CpG的总体缺乏被认为是一些种系CpG甲基化及其对突变敏感性的结果。利用人类-黑猩猩序列比对中的单核苷酸多态性(SNPs)比较两类启动子中CpG和GpC二核苷酸推断的脱氨基突变频率,结果表明,HCG启动子中CpG的突变频率较低,这表明HCG类中的CpG在种系中是低甲基化的。