School of Physics and Opto-Electronics Technology, Baoji University of Arts and Sciences, Baoji, 721016, China.
Division of Biomedical Engineering, Department of Computer Science and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada.
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae107.
With the rapid development of single-molecule sequencing (SMS) technologies, the output read length is continuously increasing. Mapping such reads onto a reference genome is one of the most fundamental tasks in sequence analysis. Mapping sensitivity is becoming a major concern since high sensitivity can detect more aligned regions on the reference and obtain more aligned bases, which are useful for downstream analysis. In this study, we present pathMap, a novel k-mer graph-based mapper that is specifically designed for mapping SMS reads with high sensitivity. By viewing the alignment chain as a path containing as many anchors as possible in the matched k-mer graph, pathMap treats chaining as a path selection problem in the directed graph. pathMap iteratively searches the longest path in the remaining nodes; more candidate chains with high quality can be effectively detected and aligned. Compared to other state-of-the-art mapping methods such as minimap2 and Winnowmap2, experiment results on simulated and real-life datasets demonstrate that pathMap obtains the number of mapped chains at least 11.50% more than its closest competitor and increases the mapping sensitivity by 17.28% and 13.84% of bases over the next-best mapper for Pacific Biosciences and Oxford Nanopore sequencing data, respectively. In addition, pathMap is more robust to sequence errors and more sensitive to species- and strain-specific identification of pathogens using MinION reads.
随着单分子测序 (SMS) 技术的快速发展,输出的读取长度不断增加。将这些读取映射到参考基因组上是序列分析中最基本的任务之一。由于高灵敏度可以检测到参考基因组上更多的对齐区域并获得更多的对齐碱基,这对于下游分析很有用,因此映射灵敏度成为一个主要关注点。在这项研究中,我们提出了 pathMap,这是一种专门为高灵敏度映射 SMS 读取而设计的新型基于 k-mer 图的映射器。通过将对齐链视为在匹配的 k-mer 图中包含尽可能多锚点的路径,pathMap 将链处理为有向图中的路径选择问题。pathMap 迭代地搜索剩余节点中的最长路径;可以有效地检测和对齐更多具有高质量的候选链。与 minimap2 和 Winnowmap2 等其他最先进的映射方法相比,在模拟和真实数据集上的实验结果表明,pathMap 获得的映射链数量至少比其最接近的竞争对手多 11.50%,并且分别将 Pacific Biosciences 和 Oxford Nanopore 测序数据的映射灵敏度提高了 17.28%和 13.84%。此外,pathMap 对序列错误更稳健,并且对 MinION 读取中病原体的种和菌株特异性识别更敏感。