Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Crewe Road South, Edinburgh, EH4 2XU, UK.
Institute for Molecular Bioscience, The University of Queensland, Brisbane, 4072, Australia.
Clin Epigenetics. 2022 Aug 10;14(1):100. doi: 10.1186/s13148-022-01320-9.
CpG methylation levels can help to explain inter-individual differences in phenotypic traits. Few studies have explored whether identifying probe subsets based on their biological and statistical properties can maximise predictions whilst minimising array content. Variance component analyses and penalised regression (epigenetic predictors) were used to test the influence of (i) the number of probes considered, (ii) mean probe variability and (iii) methylation QTL status on the variance captured in eighteen traits by blood DNA methylation. Training and test samples comprised ≤ 4450 and ≤ 2578 unrelated individuals from Generation Scotland, respectively.
As the number of probes under consideration decreased, so too did the estimates from variance components and prediction analyses. Methylation QTL status and mean probe variability did not influence variance components. However, relative effect sizes were 15% larger for epigenetic predictors based on probes with known or reported methylation QTLs compared to probes without reported methylation QTLs. Relative effect sizes were 45% larger for predictors based on probes with mean Beta-values between 10 and 90% compared to those based on hypo- or hypermethylated probes (Beta-value ≤ 10% or ≥ 90%).
Arrays with fewer probes could reduce costs, leading to increased sample sizes for analyses. Our results show that reducing array content can restrict prediction metrics and careful attention must be given to the biological and distribution properties of CpG probes in array content selection.
CpG 甲基化水平可以帮助解释表型特征的个体间差异。很少有研究探讨是否可以根据探针的生物学和统计学特性来确定探针子集,从而最大限度地提高预测效果,同时最小化芯片内容。方差成分分析和惩罚回归(表观遗传预测因子)用于测试以下因素对血液 DNA 甲基化的 18 个特征的方差捕获的影响:(i)考虑的探针数量;(ii)平均探针变异性;(iii)甲基化 QTL 状态。训练和测试样本分别来自苏格兰一代研究中≤4450 和≤2578 个无亲缘关系的个体。
随着考虑的探针数量的减少,方差成分和预测分析的估计值也随之减少。甲基化 QTL 状态和平均探针变异性不影响方差成分。然而,与没有报道甲基化 QTL 的探针相比,基于已知或报道的甲基化 QTL 的探针的表观遗传预测因子的相对效应大小增加了 15%。基于平均 Beta 值在 10%到 90%之间的探针的预测因子的相对效应大小比基于低甲基化或高甲基化探针(Beta 值≤10%或≥90%)的预测因子大 45%。
减少探针数量的芯片可以降低成本,从而增加分析的样本量。我们的研究结果表明,减少芯片内容会限制预测指标,因此在选择芯片内容时,必须要仔细考虑 CpG 探针的生物学和分布特性。