Department of Computer Science, University of California, Irvine, USA.
BMC Bioinformatics. 2014 Feb 5;15:42. doi: 10.1186/1471-2105-15-42.
Next-generation sequencing (NGS) enables rapid production of billions of bases at a relatively low cost. Mapping reads from next-generation sequencers to a given reference genome is an important first step in many sequencing applications. Popular read mappers, such as Bowtie and BWA, are optimized to return top one or a few candidate locations of each read. However, identifying all mapping locations of each read, instead of just one or a few, is also important in some sequencing applications such as ChIP-seq for discovering binding sites in repeat regions, and RNA-seq for transcript abundance estimation.
Here we present Hobbes2, a software package designed for fast and accurate alignment of NGS reads and specialized in identifying all mapping locations of each read. Hobbes2 efficiently identifies all mapping locations of reads using a novel technique that utilizes additional prefix q-grams to improve filtering. We extensively compare Hobbes2 with state-of-the-art read mappers, and show that Hobbes2 can be an order of magnitude faster than other read mappers while consuming less memory space and achieving similar accuracy.
We propose Hobbes2 to improve the accuracy of read mapping, specialized in identifying all mapping locations of each read. Hobbes2 is implemented in C++, and the source code is freely available for download at http://hobbes.ics.uci.edu.
下一代测序(NGS)能够以相对较低的成本快速产生数十亿个碱基。将下一代测序仪的读取映射到给定的参考基因组是许多测序应用中的重要第一步。流行的读取映射器,如 Bowtie 和 BWA,经过优化,可以返回每个读取的一个或几个最佳候选位置。然而,在某些测序应用中,如在重复区域中发现结合位点的 ChIP-seq 和用于估计转录物丰度的 RNA-seq,识别每个读取的所有映射位置而不仅仅是一个或几个最佳候选位置也很重要。
在这里,我们介绍了 Hobbes2,这是一个专门用于快速准确地对齐 NGS 读取并专门用于识别每个读取的所有映射位置的软件包。Hobbes2 利用一种新的技术,利用额外的前缀 q-grams 来改进过滤,有效地识别每个读取的所有映射位置。我们广泛比较了 Hobbes2 与最先进的读取映射器,并表明 Hobbes2 可以比其他读取映射器快一个数量级,同时消耗更少的内存空间并实现相似的准确性。
我们提出 Hobbes2 来提高读取映射的准确性,专门用于识别每个读取的所有映射位置。Hobbes2 是用 C++ 实现的,源代码可在 http://hobbes.ics.uci.edu 上免费下载。