CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Beijing, China; Department of Genetic Identification, Erasmus MC University Medical Center Rotterdam, Rotterdam, the Netherlands.
Department of Genetic Identification, Erasmus MC University Medical Center Rotterdam, Rotterdam, the Netherlands.
Forensic Sci Int Genet. 2019 Sep;42:8-13. doi: 10.1016/j.fsigen.2019.05.006. Epub 2019 Jun 1.
Predicting adult height from DNA has important implications in forensic DNA phenotyping. In 2014, we introduced a prediction model consisting of 180 height-associated SNPs based on data from 10,361 Northwestern Europeans enriched with tall individuals (770 > 1.88 standard deviation), which yielded a mid-ranged accuracy (AUC = 0.75 for binary prediction of tall stature and R = 0.12 for quantitative prediction of adult height). Here, we provide an update on DNA-based height predictability considering an enlarged list of subsequently-published height-associated SNPs using data from the same set of 10,361 Europeans. A prediction model based on the full set of 689 SNPs showed an improved accuracy relative to previous models for both tall stature (AUC = 0.79) and quantitative height (R = 0.21). A feature selection analysis revealed a subset of 412 most informative SNPs while the corresponding prediction model retained most of the accuracy (AUC = 0.76 and R = 0.19) achieved with the full model. Over all, our study empirically exemplifies that the accuracy for predicting human appearance phenotypes with very complex underlying genetic architectures, such as adult height, can be improved by increasing the number of phenotype-associated DNA variants. Our work also demonstrates that a careful sub-selection allows for a considerable reduction of the number of DNA predictors that achieve similar prediction accuracy as provided by the full set. This is forensically relevant due to restrictions in the number of SNPs simultaneously analyzable with forensically suitable DNA technologies in the current days of targeted massively parallel sequencing in forensic genetics.
从 DNA 预测成人身高在法医 DNA 表型预测中具有重要意义。2014 年,我们引入了一个基于 10361 名北欧人(770 人> 1.88 个标准差,身高较高)数据的包含 180 个与身高相关的 SNP 的预测模型,该模型具有中等的准确性(二进制预测高大身材的 AUC = 0.75,定量预测成人身高的 R = 0.12)。在这里,我们考虑了使用相同的 10361 名欧洲人数据集随后发表的与身高相关的 SNP 列表,提供了关于基于 DNA 的身高可预测性的最新信息。基于完整的 689 个 SNP 集的预测模型,与以前的模型相比,在高大身材(AUC = 0.79)和定量身高(R = 0.21)方面的准确性都有所提高。特征选择分析揭示了一组 412 个最具信息量的 SNP,而相应的预测模型保留了与全模型相同的大部分准确性(AUC = 0.76 和 R = 0.19)。总体而言,我们的研究实证证明,对于预测具有非常复杂潜在遗传结构的人类外貌表型,例如成年身高,可以通过增加与表型相关的 DNA 变体数量来提高准确性。我们的工作还表明,通过仔细选择,可以大大减少达到与全模型提供的预测准确性相似的预测准确性的 DNA 预测因子的数量。这在法医学上是相关的,因为在当今法医遗传学中靶向大规模平行测序的日子里,法医合适的 DNA 技术同时分析的 SNP 数量有限。