BC Children's Hospital Research Institute, Department of Medical Genetics, University of British Columbia, 950 West 28th Avenue, TRB A5-151, Vancouver, BC, V5Z 4H4, Canada.
Unit of Human Evolutionary Genetics, Institut Pasteur, 75015, Paris, France.
Clin Epigenetics. 2018 Oct 16;10(1):123. doi: 10.1186/s13148-018-0556-2.
The capacity of technologies measuring DNA methylation (DNAm) is rapidly evolving, as are the options for applicable bioinformatics methods. The most commonly used DNAm microarray, the Illumina Infinium HumanMethylation450 (450K array), has recently been replaced by the Illumina Infinium HumanMethylationEPIC (EPIC array), nearly doubling the number of targeted CpG sites. Given that a subset of 450K CpG sites is absent on the EPIC array and that several tools for both data normalization and analyses were developed on the 450K array, it is important to assess their utility when applied to EPIC array data. One of the most commonly used 450K tools is the pan-tissue epigenetic clock, a multivariate predictor of biological age based on DNAm at 353 CpG sites. Of these CpGs, 19 are missing from the EPIC array, thus raising the question of whether EPIC data can be used to accurately estimate DNAm age. We also investigated a 71-CpG epigenetic age predictor, referred to as the Hannum method, which lacks 6 probes on the EPIC array. To evaluate these epigenetic clocks in EPIC data properly, a prior assessment of the effects of data preprocessing methods on DNAm age is also required.
DNAm was quantified, on both the 450K and EPIC platforms, from human primary monocytes derived from 172 individuals. We calculated DNAm age from raw, and three different preprocessed data forms to assess the effects of different processing methods on the DNAm age estimate. Using an additional cohort, we also investigated DNAm age of peripheral blood mononuclear cells, bronchoalveolar lavage, and bronchial brushing samples using the EPIC array.
Using monocyte-derived data from subjects on both the 450K and EPIC, we found that DNAm age was highly correlated across both raw and preprocessing methods (r > 0.91). Thus, the correlation between chronological age and the DNAm age estimate is largely unaffected by platform differences and normalization methods. However, we found that the choice of normalization method and measurement platform can lead to a systematic offset in the age estimate which in turn leads to an increase in the median error. Comparing the 450K and EPIC DNAm age estimates, we observed that the median absolute difference was 1.44-3.10 years across preprocessing methods.
Here, we have provided evidence that the epigenetic clock is resistant to the lack of 19 CpG sites missing from the EPIC array as well as highlighted the importance of considering the technical variance of the epigenetic when interpreting group differences below the reported error. Furthermore, our study highlights the utility of epigenetic age acceleration measure, the residuals from a linear regression of DNAm age on chronological age, as the resulting values are robust with respect to normalization methods and measurement platforms.
测量 DNA 甲基化(DNAm)的技术能力正在迅速发展,适用的生物信息学方法的选择也在不断增加。最常用的 DNAm 微阵列——Illumina Infinium HumanMethylation450(450K 阵列)最近已被 Illumina Infinium HumanMethylationEPIC(EPIC 阵列)取代,后者靶向的 CpG 位点数量几乎增加了一倍。鉴于 EPIC 阵列上缺少一小部分 450K CpG 位点,并且已经针对数据归一化和分析开发了几种工具,因此在将其应用于 EPIC 阵列数据时评估其效用非常重要。450K 最常用的工具之一是泛组织表观遗传时钟,这是一种基于 353 个 CpG 位点的 DNAm 的生物年龄的多元预测因子。在这些 CpG 中,有 19 个在 EPIC 阵列中丢失,这就提出了一个问题,即 EPIC 数据是否可用于准确估计 DNAm 年龄。我们还研究了一种 71-CpG 表观遗传年龄预测因子,称为 Hannum 方法,该方法在 EPIC 阵列上缺少 6 个探针。为了正确评估这些表观遗传时钟在 EPIC 数据中的应用,还需要预先评估数据预处理方法对 DNAm 年龄的影响。
从 172 个人的人原代单核细胞中,在 450K 和 EPIC 平台上定量了 DNAm。我们从原始数据和三种不同的预处理数据形式计算了 DNAm 年龄,以评估不同处理方法对 DNAm 年龄估计的影响。使用另一个队列,我们还使用 EPIC 阵列研究了外周血单核细胞、支气管肺泡灌洗和支气管刷取样本的 DNAm 年龄。
使用来自 450K 和 EPIC 平台的受试者的单核细胞衍生数据,我们发现,原始数据和预处理方法之间的 DNAm 年龄高度相关(r>0.91)。因此,平台差异和归一化方法对生物钟年龄与 DNAm 年龄估计之间的相关性影响不大。但是,我们发现归一化方法和测量平台的选择会导致年龄估计的系统偏差,从而导致中位数误差增加。比较 450K 和 EPIC DNAm 年龄估计值,我们观察到在预处理方法之间,中位数绝对差值为 1.44-3.10 年。
在这里,我们提供了证据表明,表观遗传时钟能够抵抗 EPIC 阵列中缺少 19 个 CpG 位点的影响,并且强调了在解释报告误差以下的组间差异时,考虑表观遗传技术方差的重要性。此外,我们的研究强调了表观遗传年龄加速测量的实用性,即 DNAm 年龄与生物钟年龄的线性回归的残差,因为所得值对于归一化方法和测量平台具有稳健性。