Department of Physics and Astronomy, Johns Hopkins University, Baltimore, MD, USA.
School of Medicine, Sun Yat-sen University, Guangdong, China.
Bioinformatics. 2018 Aug 1;34(15):2673-2675. doi: 10.1093/bioinformatics/bty167.
The alignment of bisulfite-treated DNA sequences (BS-seq reads) to a large genome involves a significant computational burden beyond that required to align non-bisulfite-treated reads. In the analysis of BS-seq data, this can present an important performance bottleneck that can be mitigated by appropriate algorithmic and software-engineering improvements. One strategy is to modify the read-alignment algorithms by integrating the logic related to BS-seq alignment, with the goal of making the software implementation amenable to optimizations that lead to higher speed and greater sensitivity than might otherwise be attainable.
We evaluated this strategy using Arioc, a short-read aligner that uses GPU (general-purpose graphics processing unit) hardware to accelerate computationally-expensive programming logic. We integrated the BS-seq computational logic into both GPU and CPU code throughout the Arioc implementation. We then carried out a read-by-read comparison of Arioc's reported alignments with the alignments reported by well-known CPU-based BS-seq read aligners. With simulated reads, Arioc's accuracy is equal to or better than the other read aligners we evaluated. With human sequencing reads, Arioc's throughput is at least 10 times faster than existing BS-seq aligners across a wide range of sensitivity settings.
The Arioc software is available for download at https://github.com/RWilton/Arioc. It is released under a BSD open-source license.
Supplementary data are available at Bioinformatics online.
将经亚硫酸氢盐处理的 DNA 序列(BS-seq 读取)与大型基因组对齐涉及到比对齐未经亚硫酸氢盐处理的读取更大的计算负担。在 BS-seq 数据的分析中,这可能是一个重要的性能瓶颈,可以通过适当的算法和软件工程改进来缓解。一种策略是通过整合与 BS-seq 对齐相关的逻辑来修改读对齐算法,目标是使软件实现能够进行优化,从而实现比其他方法更高的速度和更高的灵敏度。
我们使用 Arioc 评估了这种策略,Arioc 是一种使用 GPU(通用图形处理单元)硬件加速计算密集型编程逻辑的短读取对齐器。我们在整个 Arioc 实现中将 BS-seq 计算逻辑集成到 GPU 和 CPU 代码中。然后,我们逐字节比较了 Arioc 报告的对齐与我们评估的基于 CPU 的著名 BS-seq 读取对齐器报告的对齐。在模拟读取中,Arioc 的准确性与我们评估的其他读取对齐器相等或更好。在人类测序读取中,Arioc 的吞吐量在各种灵敏度设置下至少比现有 BS-seq 对齐器快 10 倍。
Arioc 软件可在 https://github.com/RWilton/Arioc 下载。它根据 BSD 开源许可证发布。
补充数据可在 Bioinformatics 在线获得。