College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China.
Nucleic Acids Res. 2010 Jan;38(1):e6. doi: 10.1093/nar/gkp882. Epub 2009 Oct 23.
CpG islands (CGIs) are CpG-rich regions compared to CpG-depleted bulk DNA of mammalian genomes and are generally regarded as the epigenetic regulatory regions in association with unmethylation, promoter activity and histone modifications. Accurate identification of CpG islands with epigenetic regulatory function in bulk genomes is of wide interest. Here, the common features of functional CGIs are identified using an average mutual information method to differentiate functional CGIs from the remaining CGIs. A new approach (CpG mutual information, CpG_MI) was further explored to identify functional CGIs based on the cumulative mutual information of physical distances between two neighboring CpGs. Compared to current approaches, CpG_MI achieved the highest prediction accuracy. This approach also identified new functional CGIs overlapping with gene promoter regions which were missed by other algorithms. Nearly all CGIs identified by CpG_MI overlapped with histone modification marks. CpG_MI could also be used to identify potential functional CGIs in other mammalian genomes, as the CpG dinucleotide contents and cumulative mutual information distributions are almost the same among six mammalian genomes in our analysis. It is a reliable quantitative tool for the identification of functional CGIs from bulk genomes and helps in understanding the relationships between genomic functional elements and epigenomic modifications.
CpG 岛(CGIs)是富含 CpG 的区域,与哺乳动物基因组中 CpG 匮乏的大片段 DNA 相比,通常被认为是与非甲基化、启动子活性和组蛋白修饰相关的表观遗传调控区域。准确识别具有表观遗传调控功能的 CGIs 在大片段基因组中具有广泛的研究兴趣。本研究使用平均互信息方法来识别功能 CGIs 的共同特征,从而将功能 CGIs 与其余 CGIs 区分开来。进一步探索了一种新方法(CpG 互信息,CpG_MI),该方法基于两个相邻 CpG 之间的物理距离的累积互信息来识别功能 CGIs。与现有的方法相比,CpG_MI 实现了最高的预测准确性。该方法还鉴定了新的功能 CGIs,这些 CGIs与基因启动子区域重叠,但被其他算法所忽略。几乎所有通过 CpG_MI 鉴定的 CGIs 都与组蛋白修饰标记重叠。CpG_MI 还可以用于识别其他哺乳动物基因组中的潜在功能 CGIs,因为在我们的分析中,六个哺乳动物基因组的 CpG 二核苷酸含量和累积互信息分布几乎相同。它是一种从大片段基因组中识别功能 CGIs 的可靠定量工具,有助于理解基因组功能元件与表观遗传修饰之间的关系。