Brief Funct Genomics. 2018 Jan 1;17(1):42-48. doi: 10.1093/bfgp/elx017.
The HumanMethylation450 BeadChip array (450K; Infinium) is a widely used tool in epigenomics. A recognized concern in the 450K platform is the potential effect of the number of probes/gene (PG) on ranking differentially methylated (DM) CpGs (DM-CpGs) before testing for enrichment of gene ontology categories. We previously showed in a fatty acid (FA)-induced DNA methylation profiling study that when DM-CpGs are ranked by the number of called DM-CpGs-to-PG ratio, the 150 top-ranking gene list is enriched in pathways that overlap with the corresponding Affymetrix array-based expression data. In this study, a comparative analysis of thirteen 450K-based studies representing FA-stimulated cellular models, aging, diseased and normal tissues, revealed that the 150 top-ranking DM-CpGs are in high PG genes. This points to a significant false-negative rate in the low PG gene set when delta-beta-based ranking is performed. We show that PG is not related to the density of methylation-prone sites, as it does not follow gene length or GC content. Conversely, ranking genes by the number of DM-CpGs-to-PG ratio and analysing the 150 top-ranking entries yields significantly enriched gene disease- or tissue-specific function categories that are increased both in number and in the degree of overlap with expression data compared with delta-beta-only ranking or to the previously published gometh-based pipeline. The 15 top-ranking loci list is also significantly enriched in non-coding RNAs, a greatly underrepresented transcript type in 450K. In summary, the proposed simple normalization method yields pathobiologically relevant DM-CpGs. This method is relevant for the newly developed MethylationEPIC (Infinium) microarray.
人类甲基化 450 beadchip 阵列(450k;infinium)是表观基因组学中广泛使用的工具。在 450k 平台中,一个公认的问题是在测试基因本体类别富集之前,探针/基因(PG)的数量对差异甲基化(DM)CpG(DM-CpG)的排序可能产生影响。我们之前在一项脂肪酸(FA)诱导的 DNA 甲基化分析研究中表明,当根据称为 DM-CpG 的 DM-CpG 与 PG 比值的数量对 DM-CpG 进行排序时,排名前 150 的基因列表在与相应的 Affymetrix 基于阵列的表达数据重叠的途径中富集。在这项研究中,对代表 FA 刺激细胞模型、衰老、疾病和正常组织的 13 项基于 450k 的研究进行了比较分析,结果表明,排名前 150 的 DM-CpG 是高 PG 基因。这表明,当基于 delta-beta 进行排序时,低 PG 基因集的假阴性率很高。我们表明,PG 与易发生甲基化的位点密度无关,因为它不遵循基因长度或 GC 含量。相反,根据 DM-CpG 与 PG 的比值对基因进行排序,并分析排名前 150 的条目,与仅基于 delta-beta 排序或之前发表的基于 gometh 的管道相比,会产生显著富集的基因疾病或组织特异性功能类别,其数量和与表达数据的重叠程度都有所增加。排名前 15 的基因座列表也在非编码 RNA 中显著富集,这是 450k 中一个大大被低估的转录类型。总之,所提出的简单归一化方法产生了与病理生物学相关的 DM-CpG。该方法与新开发的甲基化 EPIC(infinium)微阵列相关。