Flexbar 3.0 - SIMD 和多核并行化。

Flexbar 3.0 - SIMD and multicore parallelization.

机构信息

Institute of Bioinformatics, Department of Mathematics and Computer Science, FU Berlin, 14195 Berlin, Germany.

Klaus Tschira Institute for Integrative Computational Cardiology, Department of Internal Medicine III, University of Heidelberg, 69120 Heidelberg, Germany.

出版信息

Bioinformatics. 2017 Sep 15;33(18):2941-2942. doi: 10.1093/bioinformatics/btx330.

DOI:10.1093/bioinformatics/btx330

PMID:28541403

Abstract

MOTIVATION

High-throughput sequencing machines can process many samples in a single run. For Illumina systems, sequencing reads are barcoded with an additional DNA tag that is contained in the respective sequencing adapters. The recognition of barcode and adapter sequences is hence commonly needed for the analysis of next-generation sequencing data. Flexbar performs demultiplexing based on barcodes and adapter trimming for such data. The massive amounts of data generated on modern sequencing machines demand that this preprocessing is done as efficiently as possible.

RESULTS

We present Flexbar 3.0, the successor of the popular program Flexbar. It employs now twofold parallelism: multi-threading and additionally SIMD vectorization. Both types of parallelism are used to speed-up the computation of pair-wise sequence alignments, which are used for the detection of barcodes and adapters. Furthermore, new features were included to cover a wide range of applications. We evaluated the performance of Flexbar based on a simulated sequencing dataset. Our program outcompetes other tools in terms of speed and is among the best tools in the presented quality benchmark.

AVAILABILITY AND IMPLEMENTATION

https://github.com/seqan/flexbar.

CONTACT

johannes.roehr@fu-berlin.de or knut.reinert@fu-berlin.de.

摘要

动机

高通量测序仪可以在一次运行中处理多个样本。对于 Illumina 系统，测序读取通过包含在各自测序接头中的附加 DNA 标签进行条形码标记。因此，在分析下一代测序数据时，通常需要识别条形码和接头序列。Flexbar 基于条形码和接头修剪对这些数据进行解复用。现代测序仪生成的大量数据要求尽可能有效地进行此预处理。

结果

我们介绍了 Flexbar 3.0，这是流行程序 Flexbar 的后继者。它现在采用了双重并行性：多线程和附加的 SIMD 向量化。这两种类型的并行性都用于加速成对序列比对的计算，这是用于检测条形码和接头的计算。此外，还包含了新功能以涵盖广泛的应用。我们基于模拟测序数据集评估了 Flexbar 的性能。在速度方面，我们的程序优于其他工具，并且在呈现的质量基准中属于最好的工具之一。