Spinozzi Giulio, Tini Valentina, Adorni Alessia, Falini Brunangelo, Martelli Maria Paola
Department of Medicine, Section of Hematology, University of Perugia, Perugia, Italy.
BMC Bioinformatics. 2020 Dec 21;21(Suppl 19):574. doi: 10.1186/s12859-020-03846-2.
RNA-Seq is an increasing used methodology to study either coding and non-coding RNA expression. There are many software tools available for each phase of the RNA-Seq analysis and each of them uses different algorithms. Furthermore, the analysis consists of several steps regarding alignment (primary-analysis), quantification, differential analysis (secondary-analysis) and any tertiary-analysis and can therefore be time-consuming to deal with each step separately, in addition to requiring a computer knowledge. For this reason, the development of an automated pipeline that allows the entire analysis to be managed through a single initial command and that is easy to use even for those without computer skills can be useful. Faced with the vast availability of RNA-Seq analysis tools, it is first of all necessary to select a limited number of pipelines to include. For this purpose, we compared eight pipelines obtained by combining the most used tools and for each one we evaluated peak of RAM, time, sensitivity and specificity.
The pipeline with shorter times, lower consumption of RAM and higher sensitivity is the one consisting in HISAT2 for alignment, featureCounts for quantification and edgeR for differential analysis. Here, we developed ARPIR, an automated pipeline that recurs by default to the cited pipeline, but it also allows to choose, between different tools, those of the pipelines having the best performances.
ARPIR allows the analysis of RNA-Seq data from groups undergoing different treatment allowing multiple comparisons in a single launch and can be used either for paired-end or single-end analysis. All the required prerequisites can be installed via a configuration script and the analysis can be launched via a graphical interface or by a template script. In addition, ARPIR makes a final tertiary-analysis that includes a Gene Ontology and Pathway analysis. The results can be viewed in an interactive Shiny App and exported in a report (pdf, word or html formats). ARPIR is an efficient and easy-to-use tool for RNA-Seq analysis from quality control to Pathway analysis that allows you to choose between different pipelines.
RNA测序是一种越来越多地用于研究编码和非编码RNA表达的方法。RNA测序分析的每个阶段都有许多软件工具可用,而且每个工具都使用不同的算法。此外,该分析包括几个关于比对(初级分析)、定量、差异分析(二级分析)以及任何三级分析的步骤,因此,除了需要计算机知识外,单独处理每个步骤可能很耗时。因此,开发一种自动化流程,使整个分析能够通过单个初始命令进行管理,并且即使对于没有计算机技能的人也易于使用,可能会很有用。面对大量可用的RNA测序分析工具,首先有必要选择有限数量的流程纳入其中。为此,我们比较了通过组合最常用工具获得的八个流程,并对每个流程评估了随机存取存储器(RAM)峰值、时间、灵敏度和特异性。
耗时较短、RAM消耗较低且灵敏度较高的流程是由用于比对的HISAT2、用于定量的featureCounts和用于差异分析的edgeR组成的流程。在这里,我们开发了ARPIR,这是一种自动化流程,默认情况下会使用上述流程,但它也允许在不同工具之间选择各流程中性能最佳的工具。
ARPIR允许对来自接受不同处理的组的RNA测序数据进行分析,能够在一次运行中进行多次比较,并且可用于双端或单端分析。所有必需的先决条件都可以通过配置脚本安装,分析可以通过图形界面或模板脚本启动。此外,ARPIR进行最终的三级分析,包括基因本体论和通路分析。结果可以在交互式Shiny应用程序中查看,并以报告(pdf、word或html格式)导出。ARPIR是一种从质量控制到通路分析的高效且易于使用的RNA测序分析工具,它允许在不同流程之间进行选择。