Li Ya, Liu Xiaozhao, Chen Maomin, Yi Shaohua, He Ximiao, Xiao Chao, Huang Daixin
Department of Forensic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, PR China.
Department of Physiology, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, PR China; Center for Genomics and Proteomics Research, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, PR China; Hubei Key Laboratory of Drug Target Research and Pharmacodynamic Evaluation, Huazhong University of Science and Technology, Wuhan 430030, PR China.
Forensic Sci Int Genet. 2025 Mar;76:103215. doi: 10.1016/j.fsigen.2024.103215. Epub 2024 Dec 25.
DNA methylation at age-related CpG (AR-CpG) sites holds significant promise for forensic age estimation. However, somatic models perform poorly in semen due to unique methylation dynamics during spermatogenesis, and current studies are constrained by the limited coverage of methylation microarrays. This study aimed to identify novel semen-specific AR-CpG sites using double-enzyme reduced representation bisulfite sequencing (dRRBS) and validate these markers, alongside previously reported sites and neighboring CpGs, using bisulfite amplicon sequencing (BSAS) to develop robust age estimation models. A methylome-wide association study was conducted on semen samples from 21 healthy Chinese men across three age groups, generating over 4 million CpG sites per sample at ≥ 5 × depth. Analysis of 721,840 shared CpG sites revealed that more than 95 % were not covered by conventional methylation microarrays. Differential methylation and correlation analyses identified 139 AR-CpG sites. A two-stage validation process using multiplex PCR-based BSAS was performed. In the first stage, 47 top dRRBS-identified AR-CpG sites, 26 literature-reported sites, and 242 neighboring CpGs were assessed in 129 semen samples (22-64 years), validating 31 dRRBS, 26 literature-reported, and 152 neighboring CpGs as age-related. The second stage examined 154 CpG sites in 247 samples (22-67 years), confirming 71 AR-CpG sites with |rho| > 0.50. Among these, chr2:129071885 (cg19998819) emerged as the strongest age-associated marker (rho = 0.81). Using the second BSAS dataset, age estimation models were developed with multiple linear regression and random forest (RF) algorithms within a repeated nested cross-validation (CV) framework (10-fold outer CV with 10-fold inner CV, repeated 10 times). The RF models demonstrated superior accuracy across feature subsets of 5-25 CpGs. The optimized 9-CpG RF model achieved an average root mean square error of 4.73 years (4.62-4.96, SD=0.10) and an average mean absolute error of 3.30 years (3.23-3.43, SD=0.06). This study demonstrates the utility of dRRBS for large-scale AR-CpG discovery and provides a robust age estimation model and a comprehensive reference database of semen-specific AR-CpG sites for forensic applications.
与年龄相关的CpG(AR-CpG)位点的DNA甲基化在法医年龄估计方面具有巨大潜力。然而,由于精子发生过程中独特的甲基化动态,体细胞模型在精液中的表现不佳,并且目前的研究受到甲基化微阵列覆盖范围有限的限制。本研究旨在使用双酶简化代表性亚硫酸氢盐测序(dRRBS)鉴定新的精液特异性AR-CpG位点,并使用亚硫酸氢盐扩增子测序(BSAS)验证这些标记物以及先前报道的位点和相邻的CpG,以开发强大的年龄估计模型。对来自三个年龄组的21名健康中国男性的精液样本进行了全甲基化组关联研究,每个样本在≥5×深度下产生超过400万个CpG位点。对721,840个共享CpG位点的分析表明,超过95%未被传统甲基化微阵列覆盖。差异甲基化和相关性分析确定了139个AR-CpG位点。使用基于多重PCR的BSAS进行了两阶段验证过程。在第一阶段,在129个精液样本(22 - 64岁)中评估了47个dRRBS鉴定的顶级AR-CpG位点、26个文献报道的位点和242个相邻的CpG,验证了31个dRRBS、26个文献报道的和152个相邻的CpG与年龄相关。第二阶段在247个样本(22 - 67岁)中检查了154个CpG位点,确认了71个|rho|>0.50的AR-CpG位点。其中,chr2:129071885(cg19998819)成为最强的年龄相关标记物(rho = 0.81)。使用第二个BSAS数据集,在重复嵌套交叉验证(CV)框架(10倍外部CV与10倍内部CV,重复10次)内,使用多元线性回归和随机森林(RF)算法开发了年龄估计模型。RF模型在5 - 25个CpG的特征子集上表现出卓越的准确性。优化后的9-CpG RF模型实现了平均均方根误差为4.73年(4.62 - 4.96,SD = 0.10),平均平均绝对误差为3.30年(3.23 - 3.43,SD = 0.06)。本研究证明了dRRBS在大规模AR-CpG发现中的实用性,并为法医应用提供了一个强大的年龄估计模型和一个全面的精液特异性AR-CpG位点参考数据库。