Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY 10032, USA.
Cancer Epidemiol Biomarkers Prev. 2013 Jun;22(6):1052-60. doi: 10.1158/1055-9965.EPI-13-0114. Epub 2013 Apr 29.
DNA methylation microarrays have become an increasingly popular means of studying the role of epigenetics in cancer, although the methods used to analyze these arrays are still being developed and existing methods are not always widely disseminated among microarray users.
We investigated two problems likely to confront DNA methylation microarray users: (i) batch effects and (ii) the use of widely available pathway analysis software to analyze results. First, DNA taken from individuals exposed to low and high levels of drinking water arsenic were plated twice on Illumina's Infinium 450 K HumanMethylation Array, once in order of exposure and again following randomization. Second, we conducted simulations in which random CpG sites were drawn from the 450 K array and subjected to pathway analysis using Ingenuity's IPA software.
The majority of differentially methylated CpG sites identified in Run One were due to batch effects; few sites were also identified in Run Two. In addition, the pathway analysis software reported many significant associations between our data, randomly drawn from the 450 K array, and various diseases and biological functions.
These analyses illustrate the pitfalls of not properly controlling for chip-specific batch effects as well as using pathway analysis software created for gene expression arrays to analyze DNA methylation array data.
We present evidence that (i) chip-specific effects can simulate plausible differential methylation results and (ii) popular pathway analysis software developed for expression arrays can yield spurious results when used in tandem with methylation microarrays.
DNA 甲基化微阵列已成为研究表观遗传学在癌症中作用的一种越来越受欢迎的手段,尽管用于分析这些微阵列的方法仍在不断发展,并且现有的方法并不总是在微阵列用户中广泛传播。
我们研究了可能面临 DNA 甲基化微阵列用户的两个问题:(i)批次效应和(ii)使用广泛可用的途径分析软件来分析结果。首先,将暴露于低水平和高水平饮用水砷的个体的 DNA 两次接种在 Illumina 的 Infinium 450 K HumanMethylation 阵列上,一次按暴露顺序,另一次随机化。其次,我们进行了模拟,其中从 450 K 阵列中随机抽取 CpG 位点,并使用 Ingenuity 的 IPA 软件进行途径分析。
在 Run One 中鉴定的大多数差异甲基化 CpG 位点是由于批次效应引起的;在 Run Two 中也鉴定了少数位点。此外,该途径分析软件报告了我们从 450 K 阵列中随机抽取的与各种疾病和生物学功能之间的许多显著关联。
这些分析说明了不适当控制芯片特定批次效应以及使用为基因表达阵列开发的途径分析软件来分析 DNA 甲基化阵列数据的潜在问题。
我们提供的证据表明,(i)芯片特定效应可以模拟合理的差异甲基化结果,(ii)为表达阵列开发的流行途径分析软件在与甲基化微阵列一起使用时可能会产生虚假结果。