Department of Computer Science, University of Calofornia Berkeley, Berkeley, CA 94720, USA.
Bioinformatics. 2013 Jul 1;29(13):1631-7. doi: 10.1093/bioinformatics/btt197. Epub 2013 May 14.
The estimation of isoform abundances from RNA-Seq data requires a time-intensive step of mapping reads to either an assembled or previously annotated transcriptome, followed by an optimization procedure for deconvolution of multi-mapping reads. These procedures are essential for downstream analysis such as differential expression. In cases where it is desirable to adjust the underlying annotation, for example, on the discovery of novel isoforms or errors in existing annotations, current pipelines must be rerun from scratch. This makes it difficult to update abundance estimates after re-annotation, or to explore the effect of changes in the transcriptome on analyses. We present a novel efficient algorithm for updating abundance estimates from RNA-Seq experiments on re-annotation that does not require re-analysis of the entire dataset. Our approach is based on a fast partitioning algorithm for identifying transcripts whose abundances may depend on the added or deleted isoforms, and on a fast follow-up approach to re-estimating abundances for all transcripts. We demonstrate the effectiveness of our methods by showing how to synchronize RNA-Seq abundance estimates with the daily RefSeq incremental updates. Thus, we provide a practical approach to maintaining relevant databases of RNA-Seq derived abundance estimates even as annotations are being constantly revised.
Our methods are implemented in software called ReXpress and are freely available, together with source code, at http://bio.math.berkeley.edu/ReXpress/.
Supplementary data are available at Bioinformatics online.
从 RNA-Seq 数据估计异构体丰度需要一个耗时的步骤,即将读取内容映射到已组装或以前注释的转录组,然后对多映射读取进行解卷积进行优化。这些程序对于下游分析(例如差异表达)是必不可少的。在需要调整基础注释的情况下,例如在发现新的异构体或现有注释中的错误时,必须从头开始重新运行当前的流水线。这使得在重新注释后难以更新丰度估计,或者难以探索转录组变化对分析的影响。我们提出了一种新颖的有效算法,用于在重新注释时更新 RNA-Seq 实验的丰度估计,而无需重新分析整个数据集。我们的方法基于一种快速分区算法,用于识别其丰度可能取决于添加或删除的异构体的转录本,以及一种快速后续方法来重新估计所有转录本的丰度。我们通过展示如何将 RNA-Seq 丰度估计与每日 RefSeq 增量更新同步,证明了我们方法的有效性。因此,我们提供了一种实用的方法,可以在不断修订注释的情况下维护与 RNA-Seq 衍生丰度估计相关的数据库。
我们的方法在名为 ReXpress 的软件中实现,并在 http://bio.math.berkeley.edu/ReXpress/ 上免费提供软件和源代码。
补充数据可在 Bioinformatics 在线获取。