Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway.
Institute of Health and Society, University of Oslo, Oslo, Norway.
Clin Epigenetics. 2023 Jul 13;15(1):114. doi: 10.1186/s13148-023-01528-3.
DNA methylation (DNAm) is robustly associated with chronological age in children and adults, and gestational age (GA) in newborns. This property has enabled the development of several epigenetic clocks that can accurately predict chronological age and GA. However, the lack of overlap in predictive CpGs across different epigenetic clocks remains elusive. Our main aim was therefore to identify and characterize CpGs that are stably predictive of GA.
We applied a statistical approach called 'stability selection' to DNAm data from 2138 newborns in the Norwegian Mother, Father, and Child Cohort study. Stability selection combines subsampling with variable selection to restrict the number of false discoveries in the set of selected variables. Twenty-four CpGs were identified as being stably predictive of GA. Intriguingly, only up to 10% of the CpGs in previous GA clocks were found to be stably selected. Based on these results, we used generalized additive model regression to develop a new GA clock consisting of only five CpGs, which showed a similar predictive performance as previous GA clocks (R = 0.674, median absolute deviation = 4.4 days). These CpGs were in or near genes and regulatory regions involved in immune responses, metabolism, and developmental processes. Furthermore, accounting for nonlinear associations improved prediction performance in preterm newborns.
We present a methodological framework for feature selection that is broadly applicable to any trait that can be predicted from DNAm data. We demonstrate its utility by identifying CpGs that are highly predictive of GA and present a new and highly performant GA clock based on only five CpGs that is more amenable to a clinical setting.
DNA 甲基化(DNAm)与儿童和成人的年龄、新生儿的胎龄(GA)呈强相关。这种特性使开发几种能够准确预测年龄和 GA 的表观遗传钟成为可能。然而,不同表观遗传钟之间的预测性 CpG 缺乏重叠仍然难以捉摸。因此,我们的主要目的是识别和描述能够稳定预测 GA 的 CpG。
我们应用了一种称为“稳定性选择”的统计方法,对来自挪威母亲、父亲和儿童队列研究的 2138 名新生儿的 DNAm 数据进行分析。稳定性选择结合了子采样和变量选择,以限制所选变量集中的假发现数量。确定了 24 个 CpG 作为 GA 的稳定预测因子。有趣的是,只有高达 10%的先前 GA 时钟中的 CpG 被稳定选择。基于这些结果,我们使用广义加性模型回归开发了一个仅由五个 CpG 组成的新 GA 时钟,其预测性能与以前的 GA 时钟相似(R=0.674,中位数绝对偏差=4.4 天)。这些 CpG 位于与免疫反应、代谢和发育过程相关的基因和调控区域内或附近。此外,考虑到非线性关联可以提高早产新生儿的预测性能。
我们提出了一种广泛适用于任何可以从 DNAm 数据预测的特征的选择方法框架。我们通过识别高度预测 GA 的 CpG 来证明其有效性,并提出了一种新的、高度有效的 GA 时钟,仅基于五个 CpG,更适合临床环境。