Suppr超能文献

系统的特征选择可提高汉族男性基于甲基化的法医年龄估计的准确性。

Systematic feature selection improves accuracy of methylation-based forensic age estimation in Han Chinese males.

机构信息

National Engineering Laboratory for Forensic Science, Key Laboratory of Forensic Genetics of Ministry of Public Security, Beijing Engineering Research Center of Crime Scene Evidence Examination, Institute of Forensic Science, Ministry of Public Security, Beijing, China.

CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China.

出版信息

Forensic Sci Int Genet. 2018 Jul;35:38-45. doi: 10.1016/j.fsigen.2018.03.009. Epub 2018 Mar 23.

Abstract

Estimating individual age from biomarkers may provide key information facilitating forensic investigations. Recent progress has shown DNA methylation at age-associated CpG sites as the most informative biomarkers for estimating the individual age of an unknown donor. Optimal feature selection plays a critical role in determining the performance of the final prediction model. In this study we investigate methylation levels at 153 age-associated CpG sites from 21 previously reported genomic regions using the EpiTYPER system for their predictive power on individual age in 390 Han Chinese males ranging from 15 to 75 years of age. We conducted a systematic feature selection using a stepwise backward multiple linear regression analysis as well as an exhaustive searching algorithm. Both approaches identified the same subset of 9 CpG sites, which in linear combination provided the optimal model fitting with mean absolute deviation (MAD) of 2.89 years of age and explainable variance (R) of 0.92. The final model was validated in two independent Han Chinese male samples (validation set 1, N = 65, MAD = 2.49, R = 0.95, and validation set 2, N = 62, MAD = 3.36, R = 0.89). Other competing models such as support vector machine and artificial neural network did not outperform the linear model to any noticeable degree. The validation set 1 was additionally analyzed using Pyrosequencing technology for cross-platform validation and was termed as validation set 3. Directly applying our model, in which the methylation levels were detected by the EpiTYPER system, to the data from pyrosequencing technology showed, however, less accurate results in terms of MAD (validation set 3, N = 65 Han Chinese males, MAD = 4.20, R = 0.93), suggesting the presence of a batch effect between different data generation platforms. This batch effect could be partially overcome by a z-score transformation (MAD = 2.76, R = 0.93). Overall, our systematic feature selection identified 9 CpG sites as the optimal subset for forensic age estimation and the prediction model consisting of these 9 markers demonstrated high potential in forensic practice. An age estimator implementing our prediction model allowing missing markers is freely available at http://liufan.big.ac.cn/AgePrediction.

摘要

从生物标志物估计个体年龄可以提供有助于法医学调查的关键信息。最近的进展表明,年龄相关 CpG 位点的 DNA 甲基化是估计未知供体个体年龄的最具信息量的生物标志物。最佳特征选择在确定最终预测模型的性能方面起着关键作用。在这项研究中,我们使用 EpiTYPER 系统调查了来自 21 个先前报道的基因组区域的 153 个与年龄相关的 CpG 位点的甲基化水平,这些 CpG 位点在 390 名年龄在 15 至 75 岁的汉族男性中的个体年龄预测能力。我们使用逐步向后多元线性回归分析和穷举搜索算法进行了系统的特征选择。这两种方法都确定了相同的 9 个 CpG 位点子集,这些 CpG 位点在组合线性时提供了最佳的模型拟合,平均绝对偏差(MAD)为 2.89 岁,可解释方差(R)为 0.92。该最终模型在两个独立的汉族男性样本(验证集 1,N=65,MAD=2.49,R=0.95,和验证集 2,N=62,MAD=3.36,R=0.89)中进行了验证。支持向量机和人工神经网络等其他竞争模型并没有在任何显著程度上超过线性模型。验证集 1 还使用焦磷酸测序技术进行了分析,用于交叉平台验证,并被称为验证集 3。然而,将我们的模型(其中甲基化水平由 EpiTYPER 系统检测)直接应用于焦磷酸测序技术的数据,在 MAD(验证集 3,N=65 名汉族男性,MAD=4.20,R=0.93)方面显示出不太准确的结果,这表明不同数据生成平台之间存在批次效应。通过 z 分数变换(MAD=2.76,R=0.93)可以部分克服这种批次效应。总的来说,我们的系统特征选择确定了 9 个 CpG 位点作为法医年龄估计的最佳子集,由这 9 个标记组成的预测模型在法医实践中具有很高的潜力。一个实现我们的预测模型的年龄估计器,允许缺失标记,可在 http://liufan.big.ac.cn/AgePrediction 免费获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验