Zhang Yuanyuan, Zhang Junying
School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China.
Mol Biosyst. 2015 Jul;11(7):1786-93. doi: 10.1039/c5mb00141b.
DNA methylation is essential not only in cellular differentiation but also in diseases. Identification of differentially methylated patterns between case and control groups is important in understanding the mechanism and possible functionality of complex diseases. We propose a method to find possible functionally methylated regions which not only are differentially methylated but also have an effect on gene expression. It integrates methylation and gene expression data and is based on distance discriminant analysis (DDA). In the procedure of identifying differentially methylated regions (DMRs), we do not need to cluster methylation sites or partition the genome in advance. Therefore, the identified DMRs have a larger coverage than those of bump hunting and Ong's methods. Furthermore, through incorporating gene expression data as a complementary source, whether these DMRs are functional is determined through estimating the difference of the corresponding genes. Through a comparison of our approach with bump hunting and Ong's methods for simulation data, it is shown that our method is more powerful in identifying DMRs which have a larger distance in the genome, or only consist of a few sites and have higher sensitivity and specificity. Also, our method is more robust to heterogeneity of data. Applied to different real datasets, we find that most of the functional DMRs are hyper-methylated and located at CpG rich regions (e.g. islands, TSS200 and TSS1500), consistent with the fact that the methylation levels of CpG islands are higher in tumors than normal. Through comparing and analyzing the results of different datasets, we find that the change of methylation in some regions may be related to diseases through changing expression of the corresponding genes, and show the effectiveness of our method.
DNA甲基化不仅在细胞分化中至关重要,在疾病中也同样如此。识别病例组和对照组之间的差异甲基化模式对于理解复杂疾病的机制和可能的功能至关重要。我们提出了一种方法来寻找可能的功能甲基化区域,这些区域不仅存在差异甲基化,而且对基因表达有影响。该方法整合了甲基化和基因表达数据,并基于距离判别分析(DDA)。在识别差异甲基化区域(DMR)的过程中,我们无需预先对甲基化位点进行聚类或对基因组进行划分。因此,所识别的DMR比撞击搜索法和Ong方法所识别的区域具有更大的覆盖范围。此外,通过将基因表达数据作为补充来源纳入,这些DMR是否具有功能是通过估计相应基因的差异来确定的。通过将我们的方法与模拟数据的撞击搜索法和Ong方法进行比较,结果表明我们的方法在识别基因组中距离更大、或仅由少数位点组成且具有更高敏感性和特异性的DMR方面更具优势。此外,我们的方法对数据的异质性更具鲁棒性。应用于不同的真实数据集时,我们发现大多数功能性DMR是高甲基化的,且位于富含CpG的区域(如岛、TSS200和TSS1500),这与肿瘤中CpG岛的甲基化水平高于正常情况这一事实相符。通过比较和分析不同数据集的结果,我们发现某些区域甲基化的变化可能通过改变相应基因的表达与疾病相关,并证明了我们方法的有效性。