Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto 14049-900, SP, Brazil.
Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto 14040-901, SP, Brazil.
Genes (Basel). 2022 Nov 24;13(12):2205. doi: 10.3390/genes13122205.
Achieving accurate STR genotyping by using next-generation sequencing data has been challenging. To provide the forensic genetics community with a reliable open-access STR database, we conducted a comprehensive genotyping analysis of a set of STRs of broad forensic interest obtained from 1000 Genome populations. We analyzed 22 STR markers using files of the high-coverage dataset of Phase 3 of the 1000 Genomes Project. We used HipSTR to call genotypes from 2504 samples obtained from 26 populations. We were not able to detect the D21S11 marker. The Hardy-Weinberg equilibrium analysis coupled with a comprehensive analysis of allele frequencies revealed that HipSTR was not able to identify longer alleles, which resulted in heterozygote deficiency. Nevertheless, AMOVA, a clustering analysis that uses STRUCTURE, and a Principal Coordinates Analysis showed a clear-cut separation between the four major ancestries sampled by the 1000 Genomes Consortium. Except for larger Penta D and Penta E alleles, and two very small Penta D alleles (2.2 and 3.2) usually observed in African populations, our analyses revealed that allele frequencies and genotypes offered as an open-access database are consistent and reliable.
通过使用下一代测序数据实现准确的 STR 基因分型一直具有挑战性。为了向法医遗传学界提供可靠的开放获取 STR 数据库,我们对来自 1000 基因组人群的一组广泛具有法医意义的 STR 进行了全面的基因分型分析。我们使用 1000 基因组项目第三阶段高覆盖率数据集的文件分析了 22 个 STR 标记。我们使用 HipSTR 从 26 个群体的 2504 个样本中调用基因型。我们无法检测到 D21S11 标记。与等位基因频率的综合分析相结合的 Hardy-Weinberg 平衡分析表明,HipSTR 无法识别较长的等位基因,这导致杂合子不足。然而,AMOVA、使用 STRUCTURE 的聚类分析和主坐标分析清楚地区分了 1000 基因组联盟采样的四个主要祖先。除了较大的 Penta D 和 Penta E 等位基因,以及两个通常在非洲人群中观察到的非常小的 Penta D 等位基因(2.2 和 3.2)之外,我们的分析表明,作为开放获取数据库提供的等位基因频率和基因型是一致和可靠的。