Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia.
Department of Medical Biology, The University of Melbourne, Parkville, Victoria 3010, Australia.
Nucleic Acids Res. 2019 May 7;47(8):e47. doi: 10.1093/nar/gkz114.
We present Rsubread, a Bioconductor software package that provides high-performance alignment and read counting functions for RNA-seq reads. Rsubread is based on the successful Subread suite with the added ease-of-use of the R programming environment, creating a matrix of read counts directly as an R object ready for downstream analysis. It integrates read mapping and quantification in a single package and has no software dependencies other than R itself. We demonstrate Rsubread's ability to detect exon-exon junctions de novo and to quantify expression at the level of either genes, exons or exon junctions. The resulting read counts can be input directly into a wide range of downstream statistical analyses using other Bioconductor packages. Using SEQC data and simulations, we compare Rsubread to TopHat2, STAR and HTSeq as well as to counting functions in the Bioconductor infrastructure packages. We consider the performance of these tools on the combined quantification task starting from raw sequence reads through to summary counts, and in particular evaluate the performance of different combinations of alignment and counting algorithms. We show that Rsubread is faster and uses less memory than competitor tools and produces read count summaries that more accurately correlate with true values.
我们介绍了 Rsubread,这是一个 Bioconductor 软件包,提供了高性能的 RNA-seq 读段比对和计数功能。Rsubread 基于成功的 Subread 套件,并增加了 R 编程环境的易用性,直接创建一个读段计数矩阵作为 R 对象,为下游分析做好准备。它将读段映射和定量整合在一个单一的包中,除了 R 本身之外,没有其他软件依赖。我们展示了 Rsubread 从头检测exon-exon 连接和定量基因、exon 或 exon 连接水平表达的能力。得到的读段计数可以直接输入到其他 Bioconductor 包中的广泛下游统计分析中。我们使用 SEQC 数据和模拟,将 Rsubread 与 TopHat2、STAR 和 HTSeq 以及 Bioconductor 基础设施包中的计数功能进行了比较。我们考虑了这些工具在从原始序列读段到摘要计数的综合定量任务中的性能,特别是评估了不同的比对和计数算法组合的性能。我们表明,Rsubread 比竞争工具更快、内存消耗更少,并产生与真实值更准确相关的读段计数摘要。