School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.
Nucleic Acids Res. 2012 May;40(9):e67. doi: 10.1093/nar/gks047. Epub 2012 Jan 28.
The rapid expansion in the quantity and quality of RNA-Seq data requires the development of sophisticated high-performance bioinformatics tools capable of rapidly transforming this data into meaningful information that is easily interpretable by biologists. Currently available analysis tools are often not easily installed by the general biologist and most of them lack inherent parallel processing capabilities widely recognized as an essential feature of next-generation bioinformatics tools. We present here a user-friendly and fully automated RNA-Seq analysis pipeline (R-SAP) with built-in multi-threading capability to analyze and quantitate high-throughput RNA-Seq datasets. R-SAP follows a hierarchical decision making procedure to accurately characterize various classes of transcripts and achieves a near linear decrease in data processing time as a result of increased multi-threading. In addition, RNA expression level estimates obtained using R-SAP display high concordance with levels measured by microarrays.
RNA-Seq 数据在数量和质量上的快速增长,要求开发复杂的高性能生物信息学工具,能够将这些数据快速转化为生物学可轻松解释的有意义的信息。目前可用的分析工具通常不容易被普通生物学家安装,而且大多数都缺乏公认的下一代生物信息学工具的基本特征,即固有并行处理能力。我们在这里提出了一个用户友好的、完全自动化的 RNA-Seq 分析管道(R-SAP),它具有内置的多线程功能,可用于分析和定量高通量 RNA-Seq 数据集。R-SAP 采用分层决策过程,能够准确描述各种类别的转录本,并通过增加多线程实现数据处理时间的近线性减少。此外,使用 R-SAP 获得的 RNA 表达水平估计与微阵列测量的水平高度一致。