Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel.
Nucleic Acids Res. 2019 Mar 18;47(5):2436-2445. doi: 10.1093/nar/gky1318.
Short tandem repeats (STRs) are polymorphic genomic loci valuable for various applications such as research, diagnostics and forensics. However, their polymorphic nature also introduces noise during in vitro amplification, making them difficult to analyze. Although it is possible to overcome stutter noise by using amplification-free library preparation, such protocols are presently incompatible with single cell analysis and with targeted-enrichment protocols. To address this challenge, we have designed a method for direct measurement of in vitro noise. Using a synthetic STR sequencing library, we have calibrated a Markov model for the prediction of stutter patterns at any amplification cycle. By employing this model, we have managed to genotype accurately cases of severe amplification bias, and biallelic STR signals, and validated our model for several high-fidelity PCR enzymes. Finally, we compared this model in the context of a naïve STR genotyping strategy against the state-of-the-art on a benchmark of single cells, demonstrating superior accuracy.
短串联重复序列(STRs)是多态性基因组位点,在研究、诊断和法医学等各种应用中具有价值。然而,它们的多态性性质也会在体外扩增过程中引入噪声,使其难以分析。尽管可以通过无扩增文库制备来克服拖尾噪声,但这些方案目前与单细胞分析和靶向富集方案不兼容。为了解决这个挑战,我们设计了一种直接测量体外噪声的方法。使用合成 STR 测序文库,我们为预测任何扩增循环中的拖尾模式校准了一个马尔可夫模型。通过使用这个模型,我们成功地对严重的扩增偏差和双等位基因 STR 信号进行了准确的基因分型,并对几种高保真 PCR 酶进行了模型验证。最后,我们在单细胞基准测试中,将这个模型与原始的 STR 基因分型策略进行了比较,证明了它具有更高的准确性。