Lee Wonyul, Morris Jeffrey S
Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA.
Bioinformatics. 2016 Mar 1;32(5):664-72. doi: 10.1093/bioinformatics/btv659. Epub 2015 Nov 11.
DNA methylation is a key epigenetic modification that can modulate gene expression. Over the past decade, a lot of studies have focused on profiling DNA methylation and investigating its alterations in complex diseases such as cancer. While early studies were mostly restricted to CpG islands or promoter regions, recent findings indicate that many of important DNA methylation changes can occur in other regions and DNA methylation needs to be examined on a genome-wide scale. In this article, we apply the wavelet-based functional mixed model methodology to analyze the high-throughput methylation data for identifying differentially methylated loci across the genome. Contrary to many commonly-used methods that model probes independently, this framework accommodates spatial correlations across the genome through basis function modeling as well as correlations between samples through functional random effects, which allows it to be applied to many different settings and potentially leads to more power in detection of differential methylation.
We applied this framework to three different high-dimensional methylation data sets (CpG Shore data, THREE data and NIH Roadmap Epigenomics data), studied previously in other works. A simulation study based on CpG Shore data suggested that in terms of detection of differentially methylated loci, this modeling approach using wavelets outperforms analogous approaches modeling the loci as independent. For the THREE data, the method suggests newly detected regions of differential methylation, which were not reported in the original study.
Automated software called WFMM is available at https://biostatistics.mdanderson.org/SoftwareDownload CpG Shore data is available at http://rafalab.dfci.harvard.edu NIH Roadmap Epigenomics data is available at http://compbio.mit.edu/roadmap
Supplementary data are available at Bioinformatics online.
DNA甲基化是一种关键的表观遗传修饰,可调节基因表达。在过去十年中,许多研究聚焦于DNA甲基化谱分析,并研究其在癌症等复杂疾病中的变化。早期研究大多局限于CpG岛或启动子区域,但最近的研究结果表明,许多重要的DNA甲基化变化可能发生在其他区域,因此需要在全基因组范围内对DNA甲基化进行检测。在本文中,我们应用基于小波的功能混合模型方法来分析高通量甲基化数据,以识别全基因组中差异甲基化位点。与许多独立对探针进行建模的常用方法不同,该框架通过基函数建模来适应全基因组的空间相关性,并通过功能随机效应来适应样本间的相关性,这使得它能够应用于许多不同的情况,并有可能在检测差异甲基化方面具有更强的能力。
我们将此框架应用于先前在其他研究中使用过的三个不同的高维甲基化数据集(CpG海岸数据、THREE数据和美国国立卫生研究院表观基因组学路线图数据)。基于CpG海岸数据的模拟研究表明,就差异甲基化位点的检测而言,这种使用小波的建模方法优于将位点独立建模的类似方法。对于THREE数据,该方法显示了新检测到的差异甲基化区域,这些区域在原始研究中并未报道。
补充数据可在《生物信息学》在线版获取。