John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL 33136, USA.
Division of Biostatistics, Department of Public Health Sciences, University of Miami Miller School of Medicine, Miami, FL 33136, USA.
Nucleic Acids Res. 2019 Sep 26;47(17):e98. doi: 10.1093/nar/gkz590.
Recent technology has made it possible to measure DNA methylation profiles in a cost-effective and comprehensive genome-wide manner using array-based technology for epigenome-wide association studies. However, identifying differentially methylated regions (DMRs) remains a challenging task because of the complexities in DNA methylation data. Supervised methods typically focus on the regions that contain consecutive highly significantly differentially methylated CpGs in the genome, but may lack power for detecting small but consistent changes when few CpGs pass stringent significance threshold after multiple comparison. Unsupervised methods group CpGs based on genomic annotations first and then test them against phenotype, but may lack specificity because the regional boundaries of methylation are often not well defined. We present coMethDMR, a flexible, powerful, and accurate tool for identifying DMRs. Instead of testing all CpGs within a genomic region, coMethDMR carries out an additional step that selects co-methylated sub-regions first. Next, coMethDMR tests association between methylation levels within the sub-region and phenotype via a random coefficient mixed effects model that models both variations between CpG sites within the region and differential methylation simultaneously. coMethDMR offers well-controlled Type I error rate, improved specificity, focused testing of targeted genomic regions, and is available as an open-source R package.
最近的技术已经使得使用基于阵列的技术以具有成本效益和全面的全基因组方式测量 DNA 甲基化谱成为可能,从而进行全基因组关联研究。然而,由于 DNA 甲基化数据的复杂性,识别差异甲基化区域 (DMR) 仍然是一项具有挑战性的任务。有监督的方法通常专注于在基因组中包含连续的高显著差异甲基化 CpG 的区域,但在经过多次比较后,少数 CpG 通过严格的显著性阈值时,可能缺乏检测小但一致变化的能力。无监督的方法首先根据基因组注释对 CpG 进行分组,然后针对表型进行测试,但由于甲基化的区域边界通常定义不明确,因此可能缺乏特异性。我们提出了 coMethDMR,这是一种用于识别 DMR 的灵活、强大和准确的工具。coMethDMR 不是在基因组区域内测试所有 CpG,而是执行额外的步骤,首先选择共甲基化的子区域。接下来,coMethDMR 通过随机系数混合效应模型测试子区域内甲基化水平与表型之间的关联,该模型同时模拟区域内 CpG 位点之间的变异和差异甲基化。coMethDMR 提供了良好控制的 I 型错误率、提高的特异性、针对目标基因组区域的集中测试,并且作为开源 R 包提供。