Han Xueli, Xiao Chao, Yi Shaohua, Li Ya, Chen Maomin, Huang Daixin
Department of Forensic Medicine, Tongji Medical College, Huazhong University of Science and Technology, No. 13, Hangkong Road, Wuhan, 430030, China.
Int J Legal Med. 2022 Nov;136(6):1655-1665. doi: 10.1007/s00414-022-02865-3. Epub 2022 Jul 11.
Age-related CpG sites (AR-CpGs) are currently the most promising biomarkers for forensic age estimation. In our previous studies, we first validated the age correlation of seven reported AR-CpGs in blood samples of Chinese Han population. Subsequently, we screened some good age predictors from blood samples of Chinese Han population, and built pyrosequencing-based age prediction models. However, it is still important to select a set of high-performance AR-CpGs in a specific racial group and establish a simple and efficient method for accurate age estimation for forensic purpose. In this study, eight AR-CpGs, namely chr6: 11,044,628 (ELOVL2), cg06639320 (FHL2), chr1: 207,823,723 (C1orf132), cg19283806 (CCDC102B), cg14361627 (KLF14), cg17740900 (SYNE2), cg07553761 (TRIM59), and cg26947034, were selected based on our previous studies, and a multiplex methylation SNaPshot assay was developed to investigate DNA methylation levels at these AR-CpGs in 529 blood samples (aged 2-82 years) from Han Chinese population. All selected CpG sites showed strong age correlation with the correlation coefficient (r) from 0.8363 to 0.9251. Multiple linear regression (MLR) and support vector regression (SVR) age prediction models were simultaneously established to fit change characteristics of DNA methylation levels of eight AR-CpGs with the age in 374 donors' blood samples. The MLR model enabled age prediction with R = 0.923, mean absolute error (MAE) = 3.52, while the SVR model enabled age prediction with R = 0.935, MAE = 2.88. One hundred fifty-five independent samples were used as a validation set to test the two models' performance, and the prediction MAE for the validation set was 3.71 and 3.34 for the MLR and SVR models, respectively. For the MLR and SVR models, the correct prediction rate at ± 5 years reached a high level of 79.35% and 83.23%, respectively. In general, these statistical parameters indicated that the SVR model outperformed the MLR model in age prediction of the Han Chinese population. In addition, our method provides sufficient sensitivity in forensic applications and allows for 100% efficiency when examining bloodstains kept in room conditions for up to 43 days. These results indicate that our multiplex methylation SNaPshot assay is a reliable, effective, and accurate method for age prediction in blood samples from the Chinese Han population.
与年龄相关的CpG位点(AR-CpGs)是目前法医年龄估计中最有前景的生物标志物。在我们之前的研究中,我们首先在中国汉族人群的血液样本中验证了七个已报道的AR-CpGs的年龄相关性。随后,我们从中国汉族人群的血液样本中筛选出一些良好的年龄预测指标,并建立了基于焦磷酸测序的年龄预测模型。然而,在特定种族群体中选择一组高性能的AR-CpGs,并建立一种简单有效的方法用于法医目的的准确年龄估计仍然很重要。在本研究中,基于我们之前的研究选择了八个AR-CpGs,即chr6: 11,044,628(ELOVL2)、cg06639320(FHL2)、chr1: 207,823,723(C1orf132)、cg19283806(CCDC102B)、cg14361627(KLF14)、cg17740900(SYNE2)、cg07553761(TRIM59)和cg26947034,并开发了一种多重甲基化SNaPshot检测方法来研究来自中国汉族人群的529份血液样本(年龄2至82岁)中这些AR-CpGs的DNA甲基化水平。所有选定的CpG位点均显示出与年龄的强相关性,相关系数(r)在0.8363至0.9251之间。同时建立了多元线性回归(MLR)和支持向量回归(SVR)年龄预测模型,以拟合374名供体血液样本中八个AR-CpGs的DNA甲基化水平随年龄的变化特征。MLR模型的年龄预测R值为0.923,平均绝对误差(MAE)为3.52,而SVR模型的年龄预测R值为0.935,MAE为2.88。155个独立样本用作验证集来测试这两个模型的性能,验证集的预测MAE对于MLR和SVR模型分别为3.71和3.34。对于MLR和SVR模型,±5岁时的正确预测率分别达到了79.35%和83.23%的高水平。总体而言,这些统计参数表明SVR模型在汉族人群的年龄预测方面优于MLR模型。此外,我们的方法在法医应用中具有足够的灵敏度,并且在检查保存在室内条件下长达43天的血迹时效率达到100%。这些结果表明,我们的多重甲基化SNaPshot检测方法是一种用于中国汉族人群血液样本年龄预测的可靠、有效且准确的方法。