You Haeun, Lee Soong Deok, Cho Sohee
Department of Forensic Medicine, Seoul National University College of Medicine, 103 Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea.
Institute of Forensic and Anthropological Science, Seoul National University Medical Research Center, 103 Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea.
Int J Legal Med. 2025 Mar;139(2):531-540. doi: 10.1007/s00414-024-03406-w. Epub 2025 Jan 8.
Inferring the ancestral origin of DNA evidence recovered from crime scenes is crucial in forensic investigations, especially in the absence of a direct suspect match. Ancestry informative markers (AIMs) have been widely researched and commercially developed into panels targeting multiple continental regions. However, existing forensic ancestry inference panels typically group East Asian individuals into a homogenous category without further differentiation. In this study, we screened Y chromosomal short tandem repeat (Y-STR) haplotypes from 10,154 Asian individuals to explore their genetic structure and generate an ancestry inference tool through a machine learning (ML) approach. Our research identified distinct genetic separations between East Asians and their neighboring Southwest Asians, with tendencies of northern and southern differentiation observed within East Asian populations. All machine learning models developed in this study demonstrated high accuracy, with the Asian classification model achieving an optimal performance of 82.92% and the East Asian classification model reaching 84.98% accuracy. This work not only deepens the understanding of genetic substructures within Asian populations but also showcases the potential of ML in forensic ancestry inference using extensive Y-STR data. By employing computational methods to analyze intricate genetic datasets, we can enhance the resolution of ancestry in forensic contexts involving Asian populations.
推断从犯罪现场提取的DNA证据的祖先来源在法医调查中至关重要,尤其是在没有直接嫌疑人匹配的情况下。祖先信息标记(AIMs)已得到广泛研究,并在商业上开发成针对多个大陆地区的面板。然而,现有的法医祖先推断面板通常将东亚个体归为一个同质类别,没有进一步区分。在本研究中,我们筛选了10154名亚洲个体的Y染色体短串联重复序列(Y-STR)单倍型,以探索其遗传结构,并通过机器学习(ML)方法生成一种祖先推断工具。我们的研究确定了东亚人与邻近的西南亚人之间存在明显的遗传差异,并且在东亚人群中观察到了南北分化的趋势。本研究中开发的所有机器学习模型都显示出很高的准确性,亚洲分类模型的最佳性能达到82.92%,东亚分类模型的准确率达到84.98%。这项工作不仅加深了对亚洲人群遗传亚结构的理解,还展示了ML在利用广泛的Y-STR数据进行法医祖先推断方面的潜力。通过采用计算方法分析复杂的遗传数据集,我们可以提高涉及亚洲人群的法医背景下祖先推断的分辨率。