Department of Biostatistics, Mailman School of Public Health, Columbia University.
Department of Statistics, Columbian College of Arts and Sciences, the George Washington University.
Nucleic Acids Res. 2019 Jan 10;47(1):e6. doi: 10.1093/nar/gky882.
Identifying epigenetic field defects, notably early DNA methylation alterations, is important for early cancer detection. Research has suggested these early methylation alterations are infrequent across samples and identifiable as outlier samples. Here we developed a weighted epigenetic distance-based method characterizing (dis)similarity in methylation measures at multiple CpGs in a gene or a genetic region between pairwise samples, with weights to up-weight signal CpGs and down-weight noise CpGs. Using distance-based approaches, weak signals that might be filtered out in a CpG site-level analysis could be accumulated and therefore boost the overall study power. In constructing epigenetic distances, we considered both differential methylation (DM) and differential variability (DV) signals. We demonstrated the superior performance of the proposed weighted epigenetic distance-based method over non-weighted versions and site-level EWAS (epigenome-wide association studies) methods in simulation studies. Application to breast cancer methylation data from Gene Expression Omnibus (GEO) comparing normal-adjacent tissue to tumor of breast cancer patients and normal tissue of independent age-matched cancer-free women identified novel epigenetic field defects that were missed by EWAS methods, when majority were previously reported to be associated with breast cancer and were confirmed the progression to breast cancer. We further replicated some of the identified epigenetic field defects.
鉴定表观遗传领域缺陷,特别是早期 DNA 甲基化改变,对于早期癌症检测很重要。研究表明,这些早期甲基化改变在样本中很少见,可以识别为异常样本。在这里,我们开发了一种基于加权表观遗传距离的方法,用于描述基因或遗传区域中两个样本之间多个 CpG 处甲基化测量的(不)相似性,该方法对信号 CpG 进行加权,对噪声 CpG 进行去重。使用基于距离的方法,可以累积在 CpG 位点水平分析中可能被过滤掉的弱信号,从而提高整体研究能力。在构建表观遗传距离时,我们同时考虑了差异甲基化 (DM) 和差异可变性 (DV) 信号。我们在模拟研究中证明了所提出的基于加权表观遗传距离的方法优于非加权版本和基于位点的 EWAS(表观基因组关联研究)方法的性能。将从基因表达综合数据库 (GEO) 获得的乳腺癌甲基化数据应用于比较乳腺癌患者的正常邻近组织与肿瘤和年龄匹配的无癌女性的正常组织,发现了一些新的表观遗传领域缺陷,这些缺陷被 EWAS 方法所忽略,而大多数之前被报道与乳腺癌相关,并被证实与乳腺癌的进展有关。我们进一步复制了一些鉴定出的表观遗传领域缺陷。