Department of Community and Family Medicine; Geisel School of Medicine at Dartmouth College; Lebanon, NH USA.
Epigenetics. 2013 Aug;8(8):816-26. doi: 10.4161/epi.25430. Epub 2013 Jun 25.
The potential influence of underlying differences in relative leukocyte distributions in studies involving blood-based profiling of DNA methylation is well recognized and has prompted development of a set of statistical methods for inferring changes in the distribution of white blood cells using DNA methylation signatures. However, the extent to which this methodology can accurately predict cell-type proportions based on blood-derived DNA methylation data in a large-scale epigenome-wide association study (EWAS) has yet to be examined. We used publicly available data deposited in the Gene Expression Omnibus (GEO) database (accession number GSE37008), which consisted of both blood-derived epigenome-wide DNA methylation data assayed using the Illumina Infinium HumanMethylation27 BeadArray and complete blood cell (CBC) counts among a community cohort of 94 non-diseased individuals. Constrained projection (CP) was used to obtain predictions of the proportions of lymphocytes, monocytes and granulocytes for each of the study samples based on their DNA methylation signatures. Our findings demonstrated high consistency between the average CBC-derived and predicted percentage of monocytes and lymphocytes (17.9% and 17.6% for monocytes and 82.1% and 81.4% for lymphocytes), with root mean squared error (rMSE) of 5% and 6%, for monocytes and lymphocytes, respectively. Similarly, there was moderate-high correlation between the CP-predicted and CBC-derived percentages of monocytes and lymphocytes (0.60 and 0.61, respectively), and these results were robust to the number of leukocyte differentially methylated regions (L-DMRs) used for CP prediction. These results serve as further validation of the CP approach and highlight the promise of this technique for EWAS where DNA methylation is profiled using whole-blood genomic DNA.
在涉及基于血液的 DNA 甲基化谱分析的研究中,潜在的白细胞分布差异的影响是众所周知的,这促使开发了一系列统计方法,用于使用 DNA 甲基化特征推断白细胞分布的变化。然而,基于血液衍生的 DNA 甲基化数据,这种方法在大规模全基因组甲基化关联研究(EWAS)中准确预测细胞类型比例的程度尚未得到检验。我们使用了公开的、存放在基因表达综合数据库(GEO)中的数据(注册号 GSE37008),这些数据包括通过 Illumina Infinium HumanMethylation27 BeadArray 检测的血液衍生的全基因组 DNA 甲基化数据,以及 94 名非疾病个体的全血细胞(CBC)计数。约束投影(CP)用于根据研究样本的 DNA 甲基化特征,获得每个样本中淋巴细胞、单核细胞和粒细胞比例的预测值。我们的研究结果表明,基于 CBC 衍生和预测的单核细胞和淋巴细胞的平均百分比之间具有高度一致性(单核细胞为 17.9%和 17.6%,淋巴细胞为 82.1%和 81.4%),单核细胞和淋巴细胞的均方根误差(rMSE)分别为 5%和 6%。同样,CP 预测的单核细胞和淋巴细胞的百分比与 CBC 衍生的百分比之间存在中度到高度的相关性(分别为 0.60 和 0.61),并且这些结果对于用于 CP 预测的白细胞差异甲基化区域(L-DMR)的数量是稳健的。这些结果进一步验证了 CP 方法,并强调了该技术在使用全血基因组 DNA 进行全基因组甲基化分析的 EWAS 中的应用前景。