The Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
Nat Genet. 2018 Jul;50(7):1054-1059. doi: 10.1038/s41588-018-0145-5. Epub 2018 Jun 18.
Genotype estimates from short-read sequencing data are typically based on the alignment of reads to a linear reference, but reads originating from more complex variants (for example, structural variants) often align poorly, resulting in biased genotype estimates. This bias can be mitigated by first collecting a set of candidate variants across discovery methods, individuals and databases, and then realigning the reads to the variants and reference simultaneously. However, this realignment problem has proved computationally difficult. Here, we present a new method (BayesTyper) that uses exact alignment of read k-mers to a graph representation of the reference and variants to efficiently perform unbiased, probabilistic genotyping across the variation spectrum. We demonstrate that BayesTyper generally provides superior variant sensitivity and genotyping accuracy relative to existing methods when used to integrate variants across discovery approaches and individuals. Finally, we demonstrate that including a 'variation-prior' database containing already known variants significantly improves sensitivity.
基于短读测序数据的基因型估计通常基于将读取与线性参考进行比对,但源自更复杂变体(例如结构变体)的读取通常比对效果不佳,导致基因型估计存在偏差。通过首先使用各种发现方法、个体和数据库收集一组候选变体,然后同时将读取重新对齐到变体和参考上,可以减轻这种偏差。然而,这种重新对齐问题在计算上被证明是困难的。在这里,我们提出了一种新的方法(BayesTyper),它使用读取 k-mer 与参考和变体的图形表示的精确比对,以有效地在整个变异谱中进行无偏、概率基因型检测。我们证明,当用于整合来自不同发现方法和个体的变体时,BayesTyper 通常相对于现有方法提供更高的变体敏感性和基因型准确性。最后,我们证明包含包含已知变体的“变体先验”数据库显著提高了敏感性。