Laboratory of Neurobiology, Library and Information Services, and Biostatistics Branch, National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Research Triangle Park, NC 27709, USA.
Proc Natl Acad Sci U S A. 2011 Jun 7;108(23):9715-20. doi: 10.1073/pnas.1105713108. Epub 2011 May 20.
Methyl-sensitive cut counting (MSCC) with the HpaII methylation-sensitive restriction enzyme is a cost-effective method to pinpoint unmethylated CpGs at single base-pair resolution. However, it has the drawback of addressing only CpGs in the context of the CCGG site, leaving out the remainder of the possible 16 XCGX tetranucleotides in which CpGs are found. We expanded MSCC to include three additional enzymes to address a total of 5 of the 16 XCGX combinations. This allowed us to survey methylation at about one-third of all a mammalian genome's CpGs. Applied to mouse liver DNA, we correctly confirmed data reported with other methods showing hypomethylation to be concentrated at promoters and in CpG islands (CGIs), with gene bodies and intergenic regions being mostly methylated. Grouping unmethylated CpGs, characterized by high MSCC scores (7% false discovery rate), we found a large number of unmethylated regions not qualifying as CGIs located in intergenic and intronic regions, which are highly enriched in functional DNA sequences (open regulatory annotation database) as well as in noncoding yet highly conserved mammalian sequences thought to be important but with as yet unknown function. About 50% of MSCC-defined unmethylated regions do not overlap algorithm-defined CGIs and offer a novel search space in which new functionalities of DNA may be found in health and disease.
甲基化敏感切割计数 (MSCC) 结合 HpaII 甲基化敏感限制性内切酶是一种经济有效的方法,可以精确定位未甲基化的 CpG ,达到单个碱基对的分辨率。然而,它的缺点是仅针对 CCGG 位点的 CpG ,而忽略了可能的 16 个 XCGX 四核苷酸中的其余部分,其中也发现了 CpG 。我们扩展了 MSCC ,以包括另外三种酶,总共可以解决 16 个 XCGX 组合中的 5 个。这使我们能够检测大约三分之一的哺乳动物基因组中的 CpG 甲基化。应用于小鼠肝 DNA ,我们正确地证实了与其他方法报告的数据一致,即低甲基化集中在启动子和 CpG 岛 (CGI) ,而基因体和基因间区主要是甲基化的。对未甲基化的 CpG 进行分组,其特征是 MSCC 评分高 (7%的假发现率) ,我们发现了大量不符合 CGI 标准的未甲基化区域,这些区域位于基因间和内含子区域,这些区域富含功能 DNA 序列 (开放调控注释数据库) 以及非编码但高度保守的哺乳动物序列,这些序列被认为很重要,但功能未知。大约 50%的 MSCC 定义的未甲基化区域与算法定义的 CGI 不重叠,提供了一个新的搜索空间,在这个空间中,DNA 的新功能可能在健康和疾病中被发现。