Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, 14040-901 Ribeirão Preto, SP, Brazil.
Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, 14049-900 Ribeirão Preto, SP, Brazil.
Forensic Sci Int Genet. 2022 May;58:102676. doi: 10.1016/j.fsigen.2022.102676. Epub 2022 Feb 4.
Short tandem repeats (STRs) are particularly difficult to genotype with rapid evolving next-generation sequencing (NGS) technology. Long amplicons containing repetitive sequences result in alignment and genotyping errors. Stutters arising from polymerase slippage often result in reads with additional or missing repeat copies. Many tools are available for analysis of STR markers from NGS data. This study has evaluated the concordance of the HipSTR, STRait Razor, and toaSTR tools for STR genotype calling; NGS data obtained from a highly genetically diverse Brazilian population sample have been used. We found that toaSTR can retrieve a larger number of genotypes (93.8%), whereas HipSTR (84.9%) and STRait Razor present much lower genotype calling (75.3%). Accuracy levels for genotype calling are very similar (identical genotypes ~95% and correct alleles ~ 97.5%) across the three methods. All the markers presenting the same genotype through the methods are in Hardy-Weinberg equilibrium. We found that combined match probability and combined exclusion power are 2.90 × 10 and 0.99999999982, respectively. Although toaSTR has varying locus-specific differences and better overall performance of toaSTR, the three programs are reliable genotyping tools. Notwithstanding, additional effort is necessary to improve the genotype calling accuracy of next-generation sequencing datasets.
短串联重复序列(STRs)特别难以用快速进化的下一代测序(NGS)技术进行基因分型。含有重复序列的长扩增子会导致对齐和基因分型错误。聚合酶滑动引起的突跳常常导致带有额外或缺失重复拷贝的读取。有许多工具可用于分析 NGS 数据中的 STR 标记。本研究评估了 HipSTR、STRait Razor 和 toaSTR 工具在 STR 基因型调用中的一致性;使用了来自高度遗传多样化的巴西人群样本的 NGS 数据。我们发现 toaSTR 可以检索到更多的基因型(93.8%),而 HipSTR(84.9%)和 STRait Razor 的基因型调用率要低得多(75.3%)。三种方法的基因型调用准确性水平非常相似(相同基因型约为 95%,正确等位基因约为 97.5%)。通过这三种方法呈现相同基因型的所有标记都处于哈迪-温伯格平衡状态。我们发现联合匹配概率和联合排除能力分别为 2.90×10 和 0.99999999982。尽管 toaSTR 具有不同的基因座特异性差异,并且总体性能更好,但这三种程序都是可靠的基因分型工具。尽管如此,仍需要额外的努力来提高下一代测序数据集的基因型调用准确性。