Suppr超能文献

基于大规模平行测序数据和多种机器学习模型的 DNA 甲基化年龄预测。

DNA methylation-based age prediction using massively parallel sequencing data and multiple machine learning models.

机构信息

King's Forensics, Department of Analytical, Environmental and Forensic Sciences, Faculty of Life Sciences and Medicine, King's College London, 150 Stamford Street, London SE1 9NH, United Kingdom.

King's Forensics, Department of Analytical, Environmental and Forensic Sciences, Faculty of Life Sciences and Medicine, King's College London, 150 Stamford Street, London SE1 9NH, United Kingdom.

出版信息

Forensic Sci Int Genet. 2018 Nov;37:215-226. doi: 10.1016/j.fsigen.2018.09.003. Epub 2018 Sep 8.

Abstract

The field of DNA intelligence focuses on retrieving information from DNA evidence that can help narrow down large groups of suspects or define target groups of interest. With recent breakthroughs on the estimation of geographical ancestry and physical appearance, the estimation of chronological age comes to complete this circle of information. Recent studies have identified methylation sites in the human genome that correlate strongly with age and can be used for the development of age-estimation algorithms. In this study, 110 whole blood samples from individuals aged 11-93 years were analysed using a DNA methylation quantification assay based on bisulphite conversion and massively parallel sequencing (Illumina MiSeq) of 12 CpG sites. Using this data, 17 different statistical modelling approaches were compared based on root mean square error (RMSE) and a Support Vector Machine with polynomial function (SVMp) model was selected for further testing. For the selected model (RMSE = 4.9 years) the mean average error (MAE) of the blind test (n = 33) was calculated at 4.1 years, with 52% of the samples predicting with less than 4 years of error and 86% with less than 7 years. Furthermore, the sensitivity of the method was assessed both in terms of methylation quantification accuracy and prediction accuracy in the first validation of this kind. The described method retained its accuracy down to 10 ng of initial DNA input or ∼2 ng bisulphite PCR input. Finally, 34 saliva samples were analysed and following basic normalisation, the chronological age of the donors was predicted with less than 4 years of error for 50% of the samples and with less than 7 years of error for 70%.

摘要

DNA 智能领域专注于从 DNA 证据中检索信息,这些信息可以帮助缩小大量嫌疑人的范围或定义目标感兴趣群体。随着最近在地理祖先和身体外貌估计方面的突破,年龄估计也随之而来,完成了这一信息循环。最近的研究已经确定了人类基因组中与年龄密切相关的甲基化位点,并可用于开发年龄估计算法。在这项研究中,使用基于亚硫酸氢盐转化和 12 个 CpG 位点的大规模平行测序(Illumina MiSeq)的 DNA 甲基化定量分析,对来自年龄在 11-93 岁的 110 个人的全血样本进行了分析。使用该数据,基于均方根误差 (RMSE) 比较了 17 种不同的统计建模方法,并选择了具有多项式函数的支持向量机 (SVMp) 模型进行进一步测试。对于所选模型 (RMSE=4.9 年),盲测 (n=33) 的平均平均误差 (MAE) 计算为 4.1 年,其中 52%的样本预测误差小于 4 年,86%的样本预测误差小于 7 年。此外,还评估了该方法的灵敏度,包括在甲基化定量准确性和首次验证中的预测准确性方面。在所描述的方法中,其准确性可保留至初始 DNA 输入量为 10ng 或亚硫酸氢盐 PCR 输入量为 2ng。最后,分析了 34 个唾液样本,在进行基本归一化后,50%的样本的供体年龄预测误差小于 4 年,70%的样本预测误差小于 7 年。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验