Varet Hugo, Brillet-Guéguen Loraine, Coppée Jean-Yves, Dillies Marie-Agnès
Institut Pasteur, Plate-forme Transcriptome & Epigenome, Biomics, Centre d'Innovation et Recherche Technologique (Citech), Paris, France.
Institut Pasteur, Hub Bioinformatique et Biostatistique, Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 IP CNRS), Paris, France.
PLoS One. 2016 Jun 9;11(6):e0157022. doi: 10.1371/journal.pone.0157022. eCollection 2016.
Several R packages exist for the detection of differentially expressed genes from RNA-Seq data. The analysis process includes three main steps, namely normalization, dispersion estimation and test for differential expression. Quality control steps along this process are recommended but not mandatory, and failing to check the characteristics of the dataset may lead to spurious results. In addition, normalization methods and statistical models are not exchangeable across the packages without adequate transformations the users are often not aware of. Thus, dedicated analysis pipelines are needed to include systematic quality control steps and prevent errors from misusing the proposed methods.
SARTools is an R pipeline for differential analysis of RNA-Seq count data. It can handle designs involving two or more conditions of a single biological factor with or without a blocking factor (such as a batch effect or a sample pairing). It is based on DESeq2 and edgeR and is composed of an R package and two R script templates (for DESeq2 and edgeR respectively). Tuning a small number of parameters and executing one of the R scripts, users have access to the full results of the analysis, including lists of differentially expressed genes and a HTML report that (i) displays diagnostic plots for quality control and model hypotheses checking and (ii) keeps track of the whole analysis process, parameter values and versions of the R packages used.
SARTools provides systematic quality controls of the dataset as well as diagnostic plots that help to tune the model parameters. It gives access to the main parameters of DESeq2 and edgeR and prevents untrained users from misusing some functionalities of both packages. By keeping track of all the parameters of the analysis process it fits the requirements of reproducible research.
有几个用于从RNA测序数据中检测差异表达基因的R包。分析过程包括三个主要步骤,即标准化、离散度估计和差异表达检验。建议在此过程中进行质量控制步骤,但不是强制性的,未能检查数据集的特征可能会导致虚假结果。此外,如果没有用户通常不知道的适当转换,标准化方法和统计模型在不同的包之间是不可互换的。因此,需要专门的分析流程来纳入系统的质量控制步骤,并防止因滥用所提出的方法而产生错误。
SARTools是一个用于RNA测序计数数据差异分析的R流程。它可以处理涉及单个生物学因素的两个或更多条件的设计,有无阻断因素(如批次效应或样本配对)。它基于DESeq2和edgeR,由一个R包和两个R脚本模板(分别用于DESeq2和edgeR)组成。通过调整少量参数并执行其中一个R脚本,用户可以获得完整的分析结果,包括差异表达基因列表和一份HTML报告,该报告(i)显示用于质量控制和模型假设检验的诊断图,以及(ii)跟踪整个分析过程、参数值和所使用的R包的版本。
SARTools提供了对数据集的系统质量控制以及有助于调整模型参数的诊断图。它提供了对DESeq2和edgeR主要参数的访问,并防止未经培训的用户滥用这两个包的一些功能。通过跟踪分析过程的所有参数,它符合可重复研究的要求。