Institute of Informatics, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, Gliwice, PL, Poland.
Institute of Applied Computer Science, Faculty of Electrical, Electronic, Computer and Control Engineering, Lodz University of Technology, Stefanowskiego 18/22, Łódź, PL, Poland.
Bioinformatics. 2019 Jun 1;35(12):2043-2050. doi: 10.1093/bioinformatics/bty927.
Mapping reads to a reference genome is often the first step in a sequencing data analysis pipeline. The reduction of sequencing costs implies a need for algorithms able to process increasing amounts of generated data in reasonable time.
We present Whisper, an accurate and high-performant mapping tool, based on the idea of sorting reads and then mapping them against suffix arrays for the reference genome and its reverse complement. Employing task and data parallelism as well as storing temporary data on disk result in superior time efficiency at reasonable memory requirements. Whisper excels at large NGS read collections, in particular Illumina reads with typical WGS coverage. The experiments with real data indicate that our solution works in about 15% of the time needed by the well-known BWA-MEM and Bowtie2 tools at a comparable accuracy, validated in a variant calling pipeline.
Whisper is available for free from https://github.com/refresh-bio/Whisper or http://sun.aei.polsl.pl/REFRESH/Whisper/.
Supplementary data are available at Bioinformatics online.
将读取内容映射到参考基因组通常是测序数据分析管道的第一步。测序成本的降低意味着需要能够在合理的时间内处理越来越多生成数据的算法。
我们提出了 Whisper,这是一种基于排序读取内容并将其与参考基因组及其反转互补的后缀数组进行映射的准确且高性能的映射工具。采用任务和数据并行以及在磁盘上存储临时数据的方法,在合理的内存要求下实现了卓越的时间效率。Whisper 在大型 NGS 读取集合中表现出色,特别是具有典型 WGS 覆盖度的 Illumina 读取内容。使用真实数据的实验表明,我们的解决方案在可比精度下,可比 BWA-MEM 和 Bowtie2 等知名工具快约 15%,并且在变异调用管道中得到了验证。
Whisper 可从 https://github.com/refresh-bio/Whisper 或 http://sun.aei.polsl.pl/REFRESH/Whisper/ 免费获得。
补充数据可在 Bioinformatics 在线获取。