Wu Michael C, Joubert Bonnie R, Kuan Pei-fen, Håberg Siri E, Nystad Wenche, Peddada Shyamal D, London Stephanie J
Department of Biostatistics; The University of North Carolina at Chapel Hill; Chapel Hill, NC USA; Public Health Sciences Division; Fred Hutchinson Cancer Research Center; Seattle, WA USA.
Division of Intramural Research; National Institute of Environmental Health Sciences; National Institutes of Health; Research Triangle Park, NC USA.
Epigenetics. 2014 Feb;9(2):318-29. doi: 10.4161/epi.27119. Epub 2013 Nov 15.
The Illumina Infinium HumanMethylation450 BeadChip has emerged as one of the most popular platforms for genome wide profiling of DNA methylation. While the technology is wide-spread, systematic technical biases are believed to be present in the data. For example, this array incorporates two different chemical assays, i.e., Type I and Type II probes, which exhibit different technical characteristics and potentially complicate the computational and statistical analysis. Several normalization methods have been introduced recently to adjust for possible biases. However, there is considerable debate within the field on which normalization procedure should be used and indeed whether normalization is even necessary. Yet despite the importance of the question, there has been little comprehensive comparison of normalization methods. We sought to systematically compare several popular normalization approaches using the Norwegian Mother and Child Cohort Study (MoBa) methylation data set and the technical replicates analyzed with it as a case study. We assessed both the reproducibility between technical replicates following normalization and the effect of normalization on association analysis. Results indicate that the raw data are already highly reproducible, some normalization approaches can slightly improve reproducibility, but other normalization approaches may introduce more variability into the data. Results also suggest that differences in association analysis after applying different normalizations are not large when the signal is strong, but when the signal is more modest, different normalizations can yield very different numbers of findings that meet a weaker statistical significance threshold. Overall, our work provides useful, objective assessment of the effectiveness of key normalization methods.
Illumina Infinium HumanMethylation450 BeadChip已成为全基因组DNA甲基化分析最受欢迎的平台之一。尽管该技术已广泛应用,但据信数据中存在系统性技术偏差。例如,该芯片采用了两种不同的化学检测方法,即I型和II型探针,它们具有不同的技术特性,可能会使计算和统计分析变得复杂。最近引入了几种标准化方法来调整可能的偏差。然而,该领域对于应使用哪种标准化程序以及标准化是否必要存在相当大的争议。尽管这个问题很重要,但对标准化方法的全面比较却很少。我们试图以挪威母婴队列研究(MoBa)甲基化数据集及其分析的技术重复样本为案例研究,系统地比较几种流行的标准化方法。我们评估了标准化后技术重复样本之间的可重复性以及标准化对关联分析的影响。结果表明,原始数据已经具有很高的可重复性,一些标准化方法可以略微提高可重复性,但其他标准化方法可能会给数据引入更多变异性。结果还表明,当信号较强时,应用不同标准化方法后的关联分析差异不大,但当信号较弱时,不同的标准化方法可能会产生数量非常不同的、达到较弱统计显著性阈值的结果。总体而言,我们的工作为关键标准化方法的有效性提供了有用的客观评估。