Department of informatics, University of Oslo, Gaustadalleen 23 B, Oslo, 0371, Norway.
Department of Mathematics, University of Oslo, Moltke Moes vei 35, Oslo, 0851, Norway.
BMC Genomics. 2020 Apr 6;21(1):282. doi: 10.1186/s12864-020-6685-y.
Graph-based reference genomes have become popular as they allow read mapping and follow-up analyses in settings where the exact haplotypes underlying a high-throughput sequencing experiment are not precisely known. Two recent papers show that mapping to graph-based reference genomes can improve accuracy as compared to methods using linear references. Both of these methods index the sequences for most paths up to a certain length in the graph in order to enable direct mapping of reads containing common variants. However, the combinatorial explosion of possible paths through nearby variants also leads to a huge search space and an increased chance of false positive alignments to highly variable regions.
We here assess three prominent graph-based read mappers against a hybrid baseline approach that combines an initial path determination with a tuned linear read mapping method. We show, using a previously proposed benchmark, that this simple approach is able to improve overall accuracy of read-mapping to graph-based reference genomes.
Our method is implemented in a tool Two-step Graph Mapper, which is available at https://github.com/uio-bmi/two_step_graph_mapperalong with data and scripts for reproducing the experiments. Our method highlights characteristics of the current generation of graph-based read mappers and shows potential for improvement for future graph-based read mappers.
基于图的参考基因组已经变得流行,因为它们允许在无法准确了解高通量测序实验背后的确切单倍型的情况下进行读映射和后续分析。最近的两篇论文表明,与使用线性参考的方法相比,映射到基于图的参考基因组可以提高准确性。这两种方法都对图中特定长度内的大多数路径的序列进行索引,以便能够直接映射包含常见变体的读段。然而,通过附近变体的可能路径的组合爆炸也导致了巨大的搜索空间和高变异性区域的假阳性比对的机会增加。
我们在这里评估了三种著名的基于图的读映射器与混合基线方法的对比,该混合基线方法结合了初始路径确定和经过调整的线性读映射方法。我们使用之前提出的基准测试表明,这种简单的方法能够提高基于图的参考基因组的读映射整体准确性。
我们的方法在 Two-step Graph Mapper 工具中实现,该工具可在 https://github.com/uio-bmi/two_step_graph_mapper 上获得,同时还提供用于重现实验的数据和脚本。我们的方法突出了当前一代基于图的读映射器的特点,并展示了未来基于图的读映射器改进的潜力。