College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China.
Gene. 2012 Sep 10;506(1):36-42. doi: 10.1016/j.gene.2012.06.075. Epub 2012 Jul 4.
Nowadays, some researchers normalized DNA methylation arrays data in order to remove the technical artifacts introduced by experimental differences in sample preparation, array processing and other factors. However, other researchers analyzed DNA methylation arrays without performing data normalization considering that current normalizations for methylation data may distort real differences between normal and cancer samples because cancer genomes may be extensively subject to hypomethylation and the total amount of CpG methylation might differ substantially among samples. In this study, using eight datasets by Infinium HumanMethylation27 assay, we systemically analyzed the global distribution of DNA methylation changes in cancer compared to normal control and its effect on data normalization for selecting differentially methylated (DM) genes. We showed more differentially methylated (DM) genes could be found in the Quantile/Lowess-normalized data than in the non-normalized data. We found the DM genes additionally selected in the Quantile/Lowess-normalized data showed significantly consistent methylation states in another independent dataset for the same cancer, indicating these extra DM genes were effective biological signals related to the disease. These results suggested normalization can increase the power of detecting DM genes in the context of diagnostic markers which were usually characterized by relatively large effect sizes. Besides, we evaluated the reproducibility of DM discoveries for a particular cancer type, and we found most of the DM genes additionally detected in one dataset showed the same methylation directions in the other dataset for the same cancer type, indicating that these DM genes were effective biological signals in the other dataset. Furthermore, we showed that some DM genes detected from different studies for a particular cancer type were significantly reproducible at the functional level.
如今,一些研究人员对 DNA 甲基化阵列数据进行了标准化处理,以去除样本制备、阵列处理和其他因素实验差异引入的技术伪影。然而,其他研究人员在分析 DNA 甲基化阵列时没有进行数据归一化,因为当前的甲基化数据归一化可能会扭曲正常和癌症样本之间的真实差异,因为癌症基因组可能会受到广泛的低甲基化,并且 CpG 甲基化的总量在样本之间可能有很大差异。在这项研究中,我们使用了 8 个由 Infinium HumanMethylation27 检测方法获得的数据集,系统地分析了与正常对照相比癌症中 DNA 甲基化变化的全局分布及其对选择差异甲基化 (DM) 基因的数据归一化的影响。我们发现,在 Quantile/Lowess 归一化数据中可以找到更多的差异甲基化 (DM) 基因,而不是在非归一化数据中。我们发现,在 Quantile/Lowess 归一化数据中额外选择的 DM 基因在另一个用于相同癌症的独立数据集中显示出明显一致的甲基化状态,这表明这些额外的 DM 基因是与疾病相关的有效生物学信号。这些结果表明,归一化可以增加在诊断标记物的背景下检测 DM 基因的能力,这些标记物通常具有相对较大的效应大小。此外,我们评估了特定癌症类型 DM 发现的可重复性,发现一个数据集额外检测到的大多数 DM 基因在同一癌症类型的另一个数据集显示出相同的甲基化方向,表明这些 DM 基因在另一个数据集是有效的生物学信号。此外,我们表明,特定癌症类型的不同研究中检测到的一些 DM 基因在功能水平上具有显著的可重复性。