Williams Claire R, Baccarella Alyssa, Parrish Jay Z, Kim Charles C
Department of Biology, University of Washington, Seattle, WA, 98195, USA.
Division of Experimental Medicine, Department of Medicine, University of California, San Francisco, CA, 94143, USA.
BMC Bioinformatics. 2017 Jan 17;18(1):38. doi: 10.1186/s12859-016-1457-z.
RNA-Seq has supplanted microarrays as the preferred method of transcriptome-wide identification of differentially expressed genes. However, RNA-Seq analysis is still rapidly evolving, with a large number of tools available for each of the three major processing steps: read alignment, expression modeling, and identification of differentially expressed genes. Although some studies have benchmarked these tools against gold standard gene expression sets, few have evaluated their performance in concert with one another. Additionally, there is a general lack of testing of such tools on real-world, physiologically relevant datasets, which often possess qualities not reflected in tightly controlled reference RNA samples or synthetic datasets.
Here, we evaluate 219 combinatorial implementations of the most commonly used analysis tools for their impact on differential gene expression analysis by RNA-Seq. A test dataset was generated using highly purified human classical and nonclassical monocyte subsets from a clinical cohort, allowing us to evaluate the performance of 495 unique workflows, when accounting for differences in expression units and gene- versus transcript-level estimation. We find that the choice of methodologies leads to wide variation in the number of genes called significant, as well as in performance as gauged by precision and recall, calculated by comparing our RNA-Seq results to those from four previously published microarray and BeadChip analyses of the same cell populations. The method of differential gene expression identification exhibited the strongest impact on performance, with smaller impacts from the choice of read aligner and expression modeler. Many workflows were found to exhibit similar overall performance, but with differences in their calibration, with some biased toward higher precision and others toward higher recall.
There is significant heterogeneity in the performance of RNA-Seq workflows to identify differentially expressed genes. Among the higher performing workflows, different workflows exhibit a precision/recall tradeoff, and the ultimate choice of workflow should take into consideration how the results will be used in subsequent applications. Our analyses highlight the performance characteristics of these workflows, and the data generated in this study could also serve as a useful resource for future development of software for RNA-Seq analysis.
RNA测序已取代微阵列,成为全转录组范围内鉴定差异表达基因的首选方法。然而,RNA测序分析仍在迅速发展,在三个主要处理步骤( reads比对、表达建模和差异表达基因鉴定)中的每一步都有大量工具可用。尽管一些研究已根据金标准基因表达集对这些工具进行了基准测试,但很少有研究评估它们相互配合时的性能。此外,普遍缺乏在真实的、生理相关数据集上对这类工具进行测试,而这些数据集通常具有在严格控制的参考RNA样本或合成数据集中未体现的特性。
在这里,我们评估了最常用分析工具的219种组合实现方式对RNA测序差异基因表达分析的影响。我们使用来自临床队列的高度纯化的人类经典和非经典单核细胞亚群生成了一个测试数据集,在考虑表达单位差异以及基因水平和转录本水平估计差异的情况下,使我们能够评估495种独特工作流程的性能。我们发现,方法的选择会导致被判定为显著的基因数量有很大差异,以及在通过将我们的RNA测序结果与之前发表的对相同细胞群体的四项微阵列和BeadChip分析结果进行比较计算得出的精度和召回率所衡量的性能方面存在差异。差异基因表达鉴定方法对性能的影响最为显著, reads比对工具和表达建模工具的选择影响较小。发现许多工作流程表现出相似的整体性能,但在校准方面存在差异,有些偏向更高的精度,有些则偏向更高的召回率。
RNA测序工作流程在鉴定差异表达基因的性能方面存在显著异质性。在性能较高的工作流程中,不同的工作流程表现出精度/召回率的权衡,工作流程的最终选择应考虑结果将如何在后续应用中使用。我们的分析突出了这些工作流程的性能特征,本研究中生成的数据也可作为RNA测序分析软件未来开发的有用资源。