Butcher Lee M, Beck Stephan
UCL Cancer Institute, University College London, London WC1E 6BT, UK.
UCL Cancer Institute, University College London, London WC1E 6BT, UK.
Methods. 2015 Jan 15;72:21-8. doi: 10.1016/j.ymeth.2014.10.036. Epub 2014 Nov 24.
The speed and resolution at which we can scour the genome for DNA methylation changes has improved immeasurably in the last 10years and the advent of the Illumina 450K BeadChip has made epigenome-wide association studies (EWAS) a reality. The resulting datasets are conveniently formatted to allow easy alignment of significant hits to genes and genetic features, however; methods that parse significant hits into discreet differentially methylated regions (DMRs) remain a challenge to implement. In this paper we present details of a novel DMR caller, the Probe Lasso: a flexible window based approach that gathers neighbouring significant-signals to define clear DMR boundaries for subsequent in-depth analysis. The method is implemented in the R package ChAMP (Morris et al., 2014) and returns sets of DMRs according to user-tuned levels of probe filtering (e.g., inclusion of sex chromosomes, polymorphisms) and probe-lasso size distribution. Using a sub-sample of colon cancer- and healthy colon-samples from TCGA we show that Probe Lasso shifts DMR calling away from just probe-dense regions, and calls a range of DMR sizes ranging from tens-of-bases to tens-of-kilobases in scale. Moreover, using TCGA data we show that Probe Lasso leverages more information from the array and highlights a potential role of hypomethylated transcription factor binding motifs not discoverable using a basic, fixed-window approach.
在过去十年中,我们在全基因组范围内搜寻DNA甲基化变化的速度和分辨率有了极大提高,Illumina 450K BeadChip的出现使全表观基因组关联研究(EWAS)成为现实。然而,由此产生的数据集格式方便,便于将显著位点轻松比对到基因和遗传特征上,但将显著位点解析为离散的差异甲基化区域(DMR)的方法在实施上仍然是一个挑战。在本文中,我们详细介绍了一种新型的DMR识别工具——探针套索法:一种基于灵活窗口的方法,它收集相邻的显著信号来定义清晰的DMR边界,以便后续进行深入分析。该方法在R包ChAMP(Morris等人,2014年)中实现,并根据用户调整的探针过滤水平(例如,性染色体、多态性的纳入)和探针套索大小分布返回DMR集。使用来自TCGA的结肠癌和健康结肠样本的子样本,我们表明探针套索法将DMR识别从仅探针密集区域转移开,并识别出一系列大小范围从几十个碱基到几十千碱基的DMR。此外,使用TCGA数据我们表明,探针套索法利用了来自阵列的更多信息,并突出了使用基本的固定窗口方法无法发现的低甲基化转录因子结合基序的潜在作用。