Department of Public Health Sciences and Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA.
Evolutionary Genetics Group, Department of Anthropology, University of Zurich, CH-8057 Zurich, Switzerland.
Bioinformatics. 2018 Dec 1;34(23):4115-4117. doi: 10.1093/bioinformatics/bty485.
Massively parallel capture of short tandem repeats (STRs, or microsatellites) provides a strategy for population genomic and demographic analyses at high resolution with or without a reference genome. However, the high Polymerase Chain Reaction (PCR) cycle numbers needed for target capture experiments create genotyping noise through polymerase slippage known as PCR stutter.
We developed SONiCS-Stutter mONte Carlo Simulation-a solution for stutter correction based on dense forward simulations of PCR and capture experimental conditions. To test SONiCS, we genotyped a 2499-marker STR panel in 22 humpback dolphins (Sousa sahulensis) using target capture, and generated capillary-based genotypes to validate five of these markers. In these 110 comparisons, SONiCS showed a 99.1% accuracy rate and a 98.2% genotyping success rate, miscalling a single allele in a marker with low sequence coverage and rejecting another as un-callable.
Source code and documentation for SONiCS is freely available at https://github.com/kzkedzierska/sonics. Raw read data used in experimental validation of SONiCS have been deposited in the Sequence Read Archive under accession number SRP135756.
Supplementary data are available at Bioinformatics online.
大量平行捕获短串联重复序列(STR,或微卫星)为种群基因组学和人口统计学分析提供了一种策略,可以在有或没有参考基因组的情况下进行高分辨率分析。然而,目标捕获实验所需的高聚合酶链反应(PCR)循环数会通过聚合酶滑动产生称为 PCR 重影的基因分型噪声。
我们开发了 SONiCS-Stutter mONte Carlo Simulation,这是一种基于 PCR 和捕获实验条件的密集正向模拟的重影校正解决方案。为了测试 SONiCS,我们使用目标捕获对 22 头弓头鲸( Sousa sahulensis )进行了 2499 个标记 STR 面板的基因分型,并生成了毛细管基基因分型来验证其中的 5 个标记。在这 110 次比较中,SONiCS 的准确率为 99.1%,基因分型成功率为 98.2%,在一个序列覆盖率低的标记中误报了一个等位基因,并拒绝了另一个标记无法进行基因分型。
SONiCS 的源代码和文档可在 https://github.com/kzkedzierska/sonics 上免费获得。用于 SONiCS 实验验证的原始读取数据已在序列读取档案中以 accession number SRP135756 形式存储。
补充数据可在生物信息学在线获得。