Department of Histology and Embryology, University of Medical Sciences, 60-781, Poznan, Poland.
Department of Anatomy, University of Medical Sciences, 60-781, Poznan, Poland.
Hum Genet. 2019 Jun;138(6):635-647. doi: 10.1007/s00439-019-02012-w. Epub 2019 Apr 12.
Predicting phenotypes from DNA has recently become extensively studied field in forensic research and is referred to as Forensic DNA Phenotyping. Systems based on single nucleotide polymorphisms for accurate prediction of iris, hair and skin color in global population, independent of bio-geographical ancestry, have recently been introduced. Here, we analyzed 14 SNPs for distinct skin pigmentation traits in a homogeneous cohort of 222 Polish subjects. We compared three different algorithms: General Linear Model based on logistic regression, Random Forest and Neural Network in 18 developed prediction models. We demonstrate Random Forest to be the most accurate algorithm for 3- and 4-category estimations (total of 58.3% correct calls for skin color prediction, 47.2% for tanning prediction, 50% for freckling prediction). Binomial Logistic Regression was the best approach in 2-category estimations (total of 69.4% correct calls, AUC = 0.673 for tanning prediction; total of 52.8% correct calls, AUC = 0.537 for freckling prediction). Our study confirms the association of rs12913832 (HERC2) with all three skin pigmentation traits, but also variants associated solely with certain pigmentation traits, namely rs6058017 and rs4911414 (ASIP) with skin sensitivity to sun and tanning abilities, rs12203592 (IRF4) with freckling and rs4778241 and rs4778138 (OCA2) with skin color and tanning. Finally, we assessed significant differences in allele frequencies in comparison with CEU data and our study provides a starting point for the development of prediction models for homogeneous populations with less internal differentiation than in the global predictive testing.
从 DNA 预测表型最近已成为法医研究中广泛研究的领域,被称为法医 DNA 表型预测。最近已经引入了基于单核苷酸多态性的系统,可以在全球人口中独立于生物地理祖先准确预测虹膜、头发和肤色。在这里,我们分析了 14 个 SNP 在 222 名波兰同质队列中的不同皮肤色素沉着特征。我们比较了三种不同的算法:基于逻辑回归的广义线性模型、随机森林和神经网络在 18 个开发的预测模型中的应用。我们证明随机森林是 3 类和 4 类估计最准确的算法(皮肤颜色预测的正确调用总数为 58.3%,晒黑预测为 47.2%,雀斑预测为 50%)。二项逻辑回归是 2 类估计的最佳方法(晒黑预测的正确调用总数为 69.4%,AUC=0.673;雀斑预测的正确调用总数为 52.8%,AUC=0.537)。我们的研究证实 rs12913832(HERC2)与所有三种皮肤色素沉着特征相关,但也与仅与某些色素沉着特征相关的变体相关,即 rs6058017 和 rs4911414(ASIP)与皮肤对阳光的敏感性和晒黑能力相关,rs12203592(IRF4)与雀斑相关,rs4778241 和 rs4778138(OCA2)与肤色和晒黑相关。最后,我们比较了与 CEU 数据的等位基因频率差异,并为同质人群的预测模型开发提供了起点,这些人群的内部分化程度低于全球预测测试。