School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong, China.
College of Computer and Communication Engineering, China University of Petroleum (East China), Qingdao, Shandong, China.
Biomed Res Int. 2018 Nov 18;2018:1070645. doi: 10.1155/2018/1070645. eCollection 2018.
DNA methylation is essential for regulating gene expression, and the changes of DNA methylation status are commonly discovered in disease. Therefore, identification of differentially methylation patterns, especially differentially methylated regions (DMRs), in two different groups is important for understanding the mechanism of complex diseases. Few tools exist for DMR identification through considering features of methylation data, but there is no comprehensive integration of the characteristics of DNA methylation data in current methods.
Accounting for the characteristics of methylation data, such as the correlation characteristics of neighboring CpG sites and the high heterogeneity of DNA methylation data, we propose a data-driven approach for DMR identification through evaluating the energy of single site using modified 1D Ising model. Applied to both simulated and publicly available datasets, our approach is compared with other popular methods in terms of performance. Simulated results show that our method is more sensitive than competing methods. Applied to the real data, our method can identify more common DMRs than DMRcate, ProbeLasso, and Wang's methods with a high overlapping ratio. Also, the necessity of integrating the heterogeneity and correlation characteristics in identifying DMR is shown through comparing results with only considering mean or variance signals and without considering relationship of neighboring CpG sites, respectively. Through analyzing the number of DMRs identified in real data located in different genomic regions, we find that about 90% DMRs are located in CGI which always regulates the expression of genes. It may help us understand the functional effect of DNA methylation on disease.
DNA 甲基化对于基因表达的调控至关重要,疾病中常发现 DNA 甲基化状态的改变。因此,识别两组之间的差异甲基化模式,特别是差异甲基化区域(DMR),对于理解复杂疾病的机制非常重要。目前,用于通过考虑甲基化数据特征来识别 DMR 的工具很少,但当前方法没有综合考虑 DNA 甲基化数据的特征。
针对甲基化数据的特征,如相邻 CpG 位点的相关性特征和 DNA 甲基化数据的高度异质性,我们提出了一种通过使用修改后的 1D Ising 模型评估单个位点能量来识别 DMR 的数据驱动方法。该方法应用于模拟和公开可用数据集,并与其他流行方法在性能方面进行了比较。模拟结果表明,我们的方法比竞争方法更敏感。应用于真实数据,与 DMRcate、ProbeLasso 和 Wang 的方法相比,我们的方法可以识别更多常见的 DMR,并且具有较高的重叠率。此外,通过比较仅考虑均值或方差信号而不考虑相邻 CpG 位点关系以及不考虑相关性特征的结果,分别显示了在识别 DMR 中整合异质性和相关性特征的必要性。通过分析位于不同基因组区域的真实数据中识别出的 DMR 的数量,我们发现大约 90%的 DMR 位于 CGI 中,CGI 通常调节基因的表达。这可能有助于我们理解 DNA 甲基化对疾病的功能影响。