Department of Information and Computer Science, University of Science and Technology Beijing, Beijing 100083, China.
Genes (Basel). 2021 Jun 6;12(6):870. doi: 10.3390/genes12060870.
In recent years, scientists have found a close correlation between DNA methylation and aging in epigenetics. With the in-depth research in the field of DNA methylation, researchers have established a quantitative statistical relationship to predict the individual ages. This work used human blood tissue samples to study the association between age and DNA methylation. We built two predictors based on healthy and disease data, respectively. For the health data, we retrieved a total of 1191 samples from four previous reports. By calculating the Pearson correlation coefficient between age and DNA methylation values, 111 age-related CpG sites were selected. Gradient boosting regression was utilized to build the predictive model and obtained the R value of 0.86 and MAD of 3.90 years on testing dataset, which were better than other four regression methods as well as Horvath's results. For the disease data, 354 rheumatoid arthritis samples were retrieved from a previous study. Then, 45 CpG sites were selected to build the predictor and the corresponded MAD and R were 3.11 years and 0.89 on the testing dataset respectively, which showed the robustness of our predictor. Our results were better than the ones from other four regression methods. Finally, we also analyzed the twenty-four common CpG sites in both healthy and disease datasets which illustrated the functional relevance of the selected CpG sites.
近年来,科学家在表观遗传学中发现了 DNA 甲基化与衰老之间的密切关联。随着 DNA 甲基化领域的深入研究,研究人员建立了一种定量统计关系,可以预测个体年龄。这项工作使用人类血液组织样本研究了年龄与 DNA 甲基化之间的关联。我们分别基于健康和疾病数据构建了两个预测器。对于健康数据,我们从四个先前的报告中总共检索到 1191 个样本。通过计算年龄与 DNA 甲基化值之间的 Pearson 相关系数,选择了 111 个与年龄相关的 CpG 位点。我们利用梯度提升回归来构建预测模型,在测试数据集上获得了 0.86 的 R 值和 3.90 年的 MAD,优于其他四种回归方法以及 Horvath 的结果。对于疾病数据,我们从先前的一项研究中检索到 354 个类风湿关节炎样本。然后,选择了 45 个 CpG 位点来构建预测器,在测试数据集上对应的 MAD 和 R 分别为 3.11 年和 0.89,这表明了我们的预测器的稳健性。我们的结果优于其他四种回归方法。最后,我们还分析了健康和疾病数据集中的 24 个常见 CpG 位点,说明了所选 CpG 位点的功能相关性。