Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Santiago de Compostela, Spain.
Fundación Pública Galega de Medicina Xenómica SERGAS, Grupo de Medicina Xenómica USC, IDIS, Santiago de Compostela, Spain.
Forensic Sci Int Genet. 2020 May;46:102232. doi: 10.1016/j.fsigen.2020.102232. Epub 2020 Jan 17.
In a directed search of 1000 Genomes Phase III variation data, 271,934 tri-allelic single nucleotide polymorphisms (SNPs) were identified amongst the genotypes of 2,504 individuals from 26 populations. The majority of tri-allelic SNPs have three nucleotide substitution-based alleles at the same position, while a much smaller proportion, which we did not compile, have a nucleotide insertion/deletion plus substitution alleles. SNPs with three alleles have higher discrimination power than binary loci but keep the same characteristic of optimum amplification of the fragmented DNA found in highly degraded forensic samples. Although most of the tri-allelic SNPs identified had one or two alleles at low frequencies, often single observations, we present a full compilation of the genome positions, rs-numbers and genotypes of all tri-allelic SNPs detected by the 1000 Genomes project from the more detailed analyses it applied to Phase III sequence data. A total of 8,705 tri-allelic SNPs had overall heterozygosities (averaged across all 1000 Genomes populations) higher than the binary SNP maximum value of 0.5. Of these, 1,637 displayed the highest average heterozygosity values of 0.6-0.666. The most informative tri-allelic SNPs we identified were used to construct a large-scale human identification panel for massively parallel sequencing, designed for the identification of missing persons. The large-scale MPS identification panel comprised: 1,241 autosomal tri-allelic SNPs and 29 X tri-allelic SNPs (plus 46 microhaplotypes adapted for genotyping from reduced length sequences). Allele frequency estimates are detailed for African, European, South Asian and East Asian population groups plus the Peruvian population sampled by 1000 Genomes for the 1,270 tri-allelic SNPs of the final MPS panel. We describe the selection criteria, kinship simulation experiments and genomic analyses used to select the tri-allelic SNP components of the panel. Approximately 5 % of the tri-allelic SNPs selected for the large-scale MPS identification panel gave three-genotype patterns in single individual samples or discordant genotypes for genomic control DNAs. A likely explanation for some of these unreliably genotyped loci is that they map to multiple sites in the genome - highlighting the need for caution and detailed scrutiny of multiple-allele variant data when designing future forensic SNP panels, as such patterns can arise from common structural variation in the genome, such as segmental duplications.
在对 1000 基因组计划第三阶段变异数据的定向搜索中,在来自 26 个群体的 2504 个人的基因型中鉴定出了 271934 个三等位单核苷酸多态性(SNP)。大多数三等位 SNP 在同一位置具有基于三个核苷酸替换的等位基因,而比例较小的 SNP 具有核苷酸插入/缺失加上替换等位基因,我们没有对其进行编译。具有三个等位基因的 SNP 比二态位点数具有更高的鉴别力,但保持了在高度降解的法医样本中发现的碎片化 DNA 最佳扩增的相同特征。尽管大多数鉴定的三等位 SNP 具有一个或两个低频等位基因,通常是单一观察结果,但我们提供了 1000 基因组计划从更详细分析中检测到的所有三等位 SNP 的基因组位置、rs 编号和基因型的完整编译。总共 8705 个三等位 SNP 的总体杂合度(平均跨所有 1000 基因组人群)高于二态 SNP 的最大值 0.5。其中,1637 个显示出最高的平均杂合度值为 0.6-0.666。我们鉴定的最具信息量的三等位 SNP 被用于构建用于大规模平行测序的大规模人类识别面板,专为识别失踪人员而设计。大规模 MPS 识别面板包括:1241 个常染色体三等位 SNP 和 29 个 X 三等位 SNP(加上 46 个微单倍型,适用于从缩短长度序列进行基因分型)。我们详细描述了针对非洲、欧洲、南亚和东亚人群以及 1000 基因组计划抽样的秘鲁人群的三等位 SNP 频率估计,针对最终 MPS 面板的 1270 个三等位 SNP。我们描述了选择标准、亲缘关系模拟实验和基因组分析,用于选择面板的三等位 SNP 组件。大约 5%选择用于大规模 MPS 识别面板的三等位 SNP 在单个个体样本中给出三基因型模式或基因组控制 DNA 的不一致基因型。其中一些不可靠基因分型位点的可能解释是它们映射到基因组中的多个位置-这突出表明在设计未来法医 SNP 面板时需要谨慎并仔细检查多等位基因变体数据,因为此类模式可能源于基因组中的常见结构变异,例如片段重复。