Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA.
BMC Bioinformatics. 2011 Aug 4;12:323. doi: 10.1186/1471-2105-12-323.
BACKGROUND: RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. RESULTS: We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. CONCLUSIONS: RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
背景:RNA-Seq 正在彻底改变转录物丰度的测量方式。从 RNA-Seq 数据中定量转录物的一个关键挑战是处理映射到多个基因或异构体的读取。在没有测序基因组的情况下,从头转录组组装中进行定量时,这个问题尤其重要,因为很难确定哪些转录本是同一基因的异构体。第二个重要问题是 RNA-Seq 实验的设计,包括读取的数量、读取长度,以及读取是否来自 cDNA 片段的一端或两端。
结果:我们提出了 RSEM,这是一个用户友好的软件包,用于从单端或双端 RNA-Seq 数据中定量基因和异构体的丰度。RSEM 输出丰度估计值、95%可信度区间和可视化文件,还可以模拟 RNA-Seq 数据。与其他现有工具不同,该软件不需要参考基因组。因此,与从头转录组组装器结合使用,RSEM 能够为没有测序基因组的物种进行准确的转录物定量。在模拟和真实数据集上,RSEM 的性能优于或可与依赖参考基因组的定量方法相媲美。利用 RSEM 有效利用模糊映射读取的能力,我们表明,大量短的单端读取可获得最佳的基因水平丰度估计值。另一方面,通过使用双端读取,可以改善单个基因内异构体相对频率的估计值,具体取决于每个基因的可能剪接形式的数量。
结论:RSEM 是一种准确且用户友好的软件工具,用于从 RNA-Seq 数据中定量转录物丰度。由于它不依赖于参考基因组的存在,因此特别适用于从头转录组组装的定量。此外,RSEM 为具有 RNA-Seq 的定量实验的经济高效设计提供了有价值的指导,目前 RNA-Seq 相对昂贵。
BMC Bioinformatics. 2011-8-4
BMC Bioinformatics. 2015-9-3
Genome Biol. 2014-12-21
BMC Genomics. 2021-4-20
BMC Bioinformatics. 2021-5-25
BMC Bioinformatics. 2024-2-1
BMC Bioinformatics. 2016-2-4
Curr Res Food Sci. 2025-8-6
Bioinformatics. 2011-6-21
Nat Biotechnol. 2011-5-15
Genome Biol. 2011-3-16
J Comput Biol. 2011-3
J Comput Biol. 2011-3
Bioinformatics. 2010-12-17
J Bioinform Comput Biol. 2010-12
Nat Methods. 2010-11-7
Nucleic Acids Res. 2011-1
Genome Biol. 2010-10-27