Song Mengyuan, Wang Xindi, Zhao Chenxi, Qian Xiaoqin, Lang Min, Hou Yiping, Song Feng
Department of Laboratory Medicine, West China Hospital, Sichuan University; Med+Molecular Diagnostics Institute of West China Hospital/West China School of Medicine, Chengdu, P. R. China.
Institute of Forensic Medicine, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu, P. R. China.
Electrophoresis. 2022 Dec;43(23-24):2351-2362. doi: 10.1002/elps.202200041. Epub 2022 Sep 3.
In the past two decades, Y chromosome data has been generated for human population genetic studies. These Y chromosome datasets were produced with various testing methods and markers, thus difficult to combine them for a comprehensive analysis. In this study, we combine four human Y chromosomal datasets of Han, Tibetan, Hui, and Li ethnic groups. The dataset contains 27 microsatellites and 137 single nucleotide polymorphisms these populations share in common. We assembled a single dataset containing 2439 individuals from 25 nationwide populations in China. A systematic analysis of genetic distance and clustering was performed. To determine the gene flow of the studied population with worldwide populations, we modeled the ancestry informative markers. The reference panel was regarded as a mixture of South Asian (SAS), East Asian (EAS), European (EUR), African (AFR), and American (AMR) populations from 1000 Genomes data of Y chromosome using nonlinear data-fitting. We then calculated the admixture proportion of these four studied populations with 26 worldwide populations. The results showed that the Han and Hui have great genetic affinity, and Hui is the most admixed ethnic group, with 61.53% EAS, 34.65% SAS, 1.91% AFR, 1.56% AMR, and 0.04% EUR ancestry component (the AMR is highly admixed and thus should be ignored). All the other three ethnic groups contained more than 97% EAS ancestry component. The Li is the least admixed population in this study. The combined dataset in this study is the largest of this kind reported to date and proposes reference population data for use in future paternal genetic studies and forensic genealogical identification.