National Center for PTSD, VA Boston Healthcare System, Boston, MA, USA.
Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA.
Epigenetics. 2022 Dec;17(13):2241-2258. doi: 10.1080/15592294.2022.2115600. Epub 2022 Sep 1.
Differentially methylated regions (DMRs) are genomic regions with specific methylation patterns across multiple loci that are associated with a phenotype. We examined the genome-wide false positive (GFP) rates of five widely used DMR methods: comb-p, Bumphunter, DMRcate, mCSEA and coMethDMR using both Illumina HumanMethylation450 (450 K) and MethylationEPIC (EPIC) data and simulated continuous and dichotomous null phenotypes (i.e., generated independently of methylation data). coMethDMR provided well-controlled GFP rates (~5%) except when analysing skewed continuous phenotypes. DMRcate generally had well-controlled GFP rates when applied to 450 K data except for the skewed continuous phenotype and EPIC data only for the normally distributed continuous phenotype. GFP rates for mCSEA were at least 0.096 and comb-p yielded GFP rates above 0.34. Bumphunter had high GFP rates of at least 0.35 across conditions, reaching as high as 0.95. Analysis of the performance of these methods in specific regions of the genome found that regions with higher correlation across loci had higher regional false positive rates on average across methods. Based on the false positive rates, coMethDMR is the most recommended analysis method, and DMRcate had acceptable performance when analysing 450 K data. However, as both could display higher levels of FPs for skewed continuous distributions, a normalizing transformation of skewed continuous phenotypes is suggested. This study highlights the importance of genome-wide simulations when evaluating the performance of DMR-analysis methods.
差异甲基化区域 (DMR) 是指在多个基因座上具有特定甲基化模式的基因组区域,与表型相关。我们使用 Illumina HumanMethylation450(450K)和 MethylationEPIC(EPIC)数据以及模拟的连续和二分零假设表型(即,与甲基化数据独立生成),检查了五种广泛使用的 DMR 方法(comb-p、Bumphunter、DMRcate、mCSEA 和 coMethDMR)的全基因组假阳性 (GFP) 率。coMethDMR 提供了良好控制的 GFP 率(~5%),除了分析偏态连续表型时。DMRcate 通常在应用于 450K 数据时具有良好控制的 GFP 率,除了偏态连续表型和 EPIC 数据仅适用于正态分布连续表型外。mCSEA 的 GFP 率至少为 0.096,comb-p 的 GFP 率高于 0.34。Bumphunter 在所有条件下的 GFP 率至少为 0.35,最高可达 0.95。对这些方法在基因组特定区域的性能分析发现,在所有方法中,具有更高基因座相关性的区域的平均区域假阳性率更高。基于假阳性率,coMethDMR 是最推荐的分析方法,而 DMRcate 在分析 450K 数据时具有可接受的性能。然而,由于两者对于偏态连续分布都可能显示出更高水平的 FPs,因此建议对偏态连续表型进行归一化转换。本研究强调了在评估 DMR 分析方法性能时进行全基因组模拟的重要性。