Human Health Program, Health and Biosecurity, CSIRO, Sydney, Australia.
Charles Perkins Centre, The University of Sydney, Sydney, Australia.
Clin Epigenetics. 2022 Apr 29;14(1):58. doi: 10.1186/s13148-022-01277-9.
Genomic technologies can be subject to significant batch-effects which are known to reduce experimental power and to potentially create false positive results. The Illumina Infinium Methylation BeadChip is a popular technology choice for epigenome-wide association studies (EWAS), but presently, little is known about the nature of batch-effects on these designs. Given the subtlety of biological phenotypes in many EWAS, control for batch-effects should be a consideration.
Using the batch-effect removal approaches in the ComBat and Harman software, we examined two in-house datasets and compared results with three large publicly available datasets, (1214 HumanMethylation450 and 1094 MethylationEPIC BeadChips in total), and find that despite various forms of preprocessing, some batch-effects persist. This residual batch-effect is associated with the day of processing, the individual glass slide and the position of the array on the slide. Consistently across all datasets, 4649 probes required high amounts of correction. To understand the impact of this set to EWAS studies, we explored the literature and found three instances where persistently batch-effect prone probes have been reported in abstracts as key sites of differential methylation. As well as batch-effect susceptible probes, we also discover a set of probes which are erroneously corrected. We provide batch-effect workflows for Infinium Methylation data and provide reference matrices of batch-effect prone and erroneously corrected features across the five datasets spanning regionally diverse populations and three commonly collected biosamples (blood, buccal and saliva).
Batch-effects are ever present, even in high-quality data, and a strategy to deal with them should be part of experimental design, particularly for EWAS. Batch-effect removal tools are useful to reduce technical variance in Infinium Methylation data, but they need to be applied with care and make use of post hoc diagnostic measures.
基因组技术可能会受到批次效应的影响,这些影响已知会降低实验的功效,并可能产生假阳性结果。Illumina Infinium 甲基化 BeadChip 是全基因组关联研究(EWAS)中常用的技术选择,但目前对于这些设计中批次效应的性质知之甚少。鉴于许多 EWAS 中生物学表型的微妙性,应该考虑控制批次效应。
我们使用了 ComBat 和 Harman 软件中的批次效应去除方法,检查了两个内部数据集,并将结果与三个大型公共数据集(总共 1214 个 HumanMethylation450 和 1094 个 MethylationEPIC BeadChips)进行了比较,发现尽管进行了各种形式的预处理,但仍存在一些批次效应。这种残留的批次效应与处理的日期、单个玻片以及玻片上的阵列位置有关。在所有数据集上,一致的是,有 4649 个探针需要大量的校正。为了了解这组探针对 EWAS 研究的影响,我们查阅了文献,发现有三个实例中,在摘要中报道了持续易受批次效应影响的探针作为差异甲基化的关键位点。除了易受批次效应影响的探针外,我们还发现了一组被错误校正的探针。我们为 Infinium 甲基化数据提供了批次效应工作流程,并提供了横跨五个数据集的易受批次效应影响和错误校正特征的参考矩阵,这些数据集涵盖了区域多样化的人群和三种常见收集的生物样本(血液、口腔和唾液)。
批次效应始终存在,即使在高质量数据中也是如此,因此应该将处理它们的策略作为实验设计的一部分,特别是对于 EWAS。批次效应去除工具可用于减少 Infinium 甲基化数据中的技术方差,但需要谨慎应用,并利用事后诊断措施。