McGregor Kevin, Bernatsky Sasha, Colmegna Ines, Hudson Marie, Pastinen Tomi, Labbe Aurélie, Greenwood Celia M T
McGill University, Department of Epidemiology, Biostatistics, and Occupational Health, 1020 Pine Ave. West, Montréal, H3A 1A2, QC, Canada.
Lady Davis Research Institute, Jewish General Hospital, 3755 Chemin de la Côte Sainte Catherine, Montréal, H3T 1E2, QC, Canada.
Genome Biol. 2016 May 3;17:84. doi: 10.1186/s13059-016-0935-y.
Many different methods exist to adjust for variability in cell-type mixture proportions when analyzing DNA methylation studies. Here we present the result of an extensive simulation study, built on cell-separated DNA methylation profiles from Illumina Infinium 450K methylation data, to compare the performance of eight methods including the most commonly used approaches.
We designed a rich multi-layered simulation containing a set of probes with true associations with either binary or continuous phenotypes, confounding by cell type, variability in means and standard deviations for population parameters, additional variability at the level of an individual cell-type-specific sample, and variability in the mixture proportions across samples. Performance varied quite substantially across methods and simulations. In particular, the number of false positives was sometimes unrealistically high, indicating limited ability to discriminate the true signals from those appearing significant through confounding. Methods that filtered probes had consequently poor power. QQ plots of p values across all tested probes showed that adjustments did not always improve the distribution. The same methods were used to examine associations between smoking and methylation data from a case-control study of colorectal cancer, and we also explored the effect of cell-type adjustments on associations between rheumatoid arthritis cases and controls.
We recommend surrogate variable analysis for cell-type mixture adjustment since performance was stable under all our simulated scenarios.
在分析DNA甲基化研究时,存在许多不同的方法来调整细胞类型混合比例的变异性。在此,我们展示了一项广泛模拟研究的结果,该研究基于来自Illumina Infinium 450K甲基化数据的细胞分离DNA甲基化谱,以比较包括最常用方法在内的八种方法的性能。
我们设计了一个丰富的多层模拟,包含一组与二元或连续表型具有真实关联的探针,受到细胞类型的混杂影响、群体参数均值和标准差的变异性、个体细胞类型特异性样本水平的额外变异性以及样本间混合比例的变异性。不同方法和模拟的性能差异相当大。特别是,假阳性的数量有时高得离谱,表明从因混杂而显得显著的信号中区分真实信号的能力有限。因此,过滤探针的方法功效较差。所有测试探针的p值QQ图表明,调整并不总是能改善分布。我们使用相同的方法来检查一项结直肠癌病例对照研究中吸烟与甲基化数据之间的关联,并且我们还探讨了细胞类型调整对类风湿性关节炎病例与对照之间关联的影响。
我们推荐使用替代变量分析进行细胞类型混合调整,因为在我们所有模拟场景下其性能都很稳定。