Srivastava Avi, Sarkar Hirak, Gupta Nitish, Patro Rob
Department of Computer Science, Stony Brook University Stony Brook, New York, NY 11794-2424, USA.
Bioinformatics. 2016 Jun 15;32(12):i192-i200. doi: 10.1093/bioinformatics/btw277.
The alignment of sequencing reads to a transcriptome is a common and important step in many RNA-seq analysis tasks. When aligning RNA-seq reads directly to a transcriptome (as is common in the de novo setting or when a trusted reference annotation is available), care must be taken to report the potentially large number of multi-mapping locations per read. This can pose a substantial computational burden for existing aligners, and can considerably slow downstream analysis.
We introduce a novel concept, quasi-mapping, and an efficient algorithm implementing this approach for mapping sequencing reads to a transcriptome. By attempting only to report the potential loci of origin of a sequencing read, and not the base-to-base alignment by which it derives from the reference, RapMap-our tool implementing quasi-mapping-is capable of mapping sequencing reads to a target transcriptome substantially faster than existing alignment tools. The algorithm we use to implement quasi-mapping uses several efficient data structures and takes advantage of the special structure of shared sequence prevalent in transcriptomes to rapidly provide highly-accurate mapping information. We demonstrate how quasi-mapping can be successfully applied to the problems of transcript-level quantification from RNA-seq reads and the clustering of contigs from de novo assembled transcriptomes into biologically meaningful groups.
RapMap is implemented in C ++11 and is available as open-source software, under GPL v3, at https://github.com/COMBINE-lab/RapMap
Supplementary data are available at Bioinformatics online.
将测序 reads 比对到转录组是许多 RNA-seq 分析任务中常见且重要的一步。当将 RNA-seq reads 直接比对到转录组时(如在从头组装的情况下或有可靠的参考注释时常见的做法),必须注意报告每个 reads 潜在的大量多比对位置。这可能给现有的比对器带来巨大的计算负担,并会显著减慢下游分析的速度。
我们引入了一个新颖的概念——准比对,以及一种实现将测序 reads 比对到转录组的有效算法。通过仅尝试报告测序 reads 的潜在起源位点,而不是其从参考序列派生而来的逐碱基比对,我们实现准比对的工具 RapMap 能够比现有的比对工具更快地将测序 reads 比对到目标转录组。我们用于实现准比对的算法使用了几种高效的数据结构,并利用转录组中普遍存在的共享序列的特殊结构,快速提供高精度的比对信息。我们展示了准比对如何能够成功应用于从 RNA-seq reads 进行转录本水平定量以及将从头组装的转录组中的重叠群聚类为具有生物学意义的组的问题。
RapMap 用 C++11 实现,作为开源软件,遵循 GPL v3 协议,可在 https://github.com/COMBINE-lab/RapMap 获取。
补充数据可在《生物信息学》在线获取。