Suppr超能文献

使用人工神经网络和下一代测序技术基于DNA甲基化的法医年龄预测

DNA methylation-based forensic age prediction using artificial neural networks and next generation sequencing.

作者信息

Vidaki Athina, Ballard David, Aliferi Anastasia, Miller Thomas H, Barron Leon P, Syndercombe Court Denise

机构信息

Department of Pharmacy and Forensic Science, King's College London, Franklin-Wilkins Building, 150 Stamford Street, London, UK.

Department of Pharmacy and Forensic Science, King's College London, Franklin-Wilkins Building, 150 Stamford Street, London, UK.

出版信息

Forensic Sci Int Genet. 2017 May;28:225-236. doi: 10.1016/j.fsigen.2017.02.009. Epub 2017 Feb 28.

Abstract

The ability to estimate the age of the donor from recovered biological material at a crime scene can be of substantial value in forensic investigations. Aging can be complex and is associated with various molecular modifications in cells that accumulate over a person's lifetime including epigenetic patterns. The aim of this study was to use age-specific DNA methylation patterns to generate an accurate model for the prediction of chronological age using data from whole blood. In total, 45 age-associated CpG sites were selected based on their reported age coefficients in a previous extensive study and investigated using publicly available methylation data obtained from 1156 whole blood samples (aged 2-90 years) analysed with Illumina's genome-wide methylation platforms (27K/450K). Applying stepwise regression for variable selection, 23 of these CpG sites were identified that could significantly contribute to age prediction modelling and multiple regression analysis carried out with these markers provided an accurate prediction of age (R=0.92, mean absolute error (MAE)=4.6 years). However, applying machine learning, and more specifically a generalised regression neural network model, the age prediction significantly improved (R=0.96) with a MAE=3.3 years for the training set and 4.4 years for a blind test set of 231 cases. The machine learning approach used 16 CpG sites, located in 16 different genomic regions, with the top 3 predictors of age belonged to the genes NHLRC1, SCGN and CSNK1D. The proposed model was further tested using independent cohorts of 53 monozygotic twins (MAE=7.1 years) and a cohort of 1011 disease state individuals (MAE=7.2 years). Furthermore, we highlighted the age markers' potential applicability in samples other than blood by predicting age with similar accuracy in 265 saliva samples (R=0.96) with a MAE=3.2 years (training set) and 4.0 years (blind test). In an attempt to create a sensitive and accurate age prediction test, a next generation sequencing (NGS)-based method able to quantify the methylation status of the selected 16 CpG sites was developed using the Illumina MiSeq platform. The method was validated using DNA standards of known methylation levels and the age prediction accuracy has been initially assessed in a set of 46 whole blood samples. Although the resulted prediction accuracy using the NGS data was lower compared to the original model (MAE=7.5years), it is expected that future optimization of our strategy to account for technical variation as well as increasing the sample size will improve both the prediction accuracy and reproducibility.

摘要

在犯罪现场从回收的生物材料中估计捐赠者年龄的能力在法医调查中具有重要价值。衰老过程可能很复杂,并且与细胞中随人一生积累的各种分子修饰有关,包括表观遗传模式。本研究的目的是利用年龄特异性DNA甲基化模式,使用全血数据生成一个准确的模型来预测实际年龄。基于之前一项广泛研究中报道的年龄系数,总共选择了45个与年龄相关的CpG位点,并使用从1156份全血样本(年龄在2至90岁之间)获得的公开可用甲基化数据进行研究,这些样本使用Illumina的全基因组甲基化平台(27K/450K)进行分析。通过逐步回归进行变量选择,确定了其中23个CpG位点可对年龄预测建模做出显著贡献,使用这些标记进行的多元回归分析提供了准确的年龄预测(R=0.92,平均绝对误差(MAE)=4.6岁)。然而,应用机器学习,更具体地说是广义回归神经网络模型,年龄预测有了显著改善(R=0.96),训练集的MAE=3.3岁,231例的盲测集的MAE=4.4岁。机器学习方法使用了位于16个不同基因组区域的16个CpG位点,年龄的前3个预测因子属于NHLRC1、SCGN和CSNK1D基因。使用53对同卵双胞胎的独立队列(MAE=7.1岁)和1011名疾病状态个体的队列(MAE=7.2岁)对所提出的模型进行了进一步测试。此外,我们通过在265份唾液样本中以相似的准确性预测年龄(R=0.96),MAE=3.2岁(训练集)和4.0岁(盲测),突出了年龄标记在血液以外样本中的潜在适用性。为了创建一个灵敏且准确的年龄预测测试,开发了一种基于下一代测序(NGS)的方法,该方法能够使用Illumina MiSeq平台量化所选16个CpG位点的甲基化状态。该方法使用已知甲基化水平的DNA标准进行了验证,并且在一组46份全血样本中初步评估了年龄预测准确性。尽管使用NGS数据得出的预测准确性与原始模型相比更低(MAE=7.5岁),但预计未来对我们的策略进行优化以考虑技术变异以及增加样本量将提高预测准确性和可重复性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/28d9/5392537/f92ef6180c87/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验