Computational Biology Unit, Department of Informatics, University of Bergen, Thormohlens Gate 55, Bergen, 5009, Norway.
BMC Bioinformatics. 2020 Mar 18;21(1):110. doi: 10.1186/s12859-020-3433-x.
With the cost of DNA sequencing decreasing, increasing amounts of RNA-Seq data are being generated giving novel insight into gene expression and regulation. Prior to analysis of gene expression, the RNA-Seq data has to be processed through a number of steps resulting in a quantification of expression of each gene/transcript in each of the analyzed samples. A number of workflows are available to help researchers perform these steps on their own data, or on public data to take advantage of novel software or reference data in data re-analysis. However, many of the existing workflows are limited to specific types of studies. We therefore aimed to develop a maximally general workflow, applicable to a wide range of data and analysis approaches and at the same time support research on both model and non-model organisms. Furthermore, we aimed to make the workflow usable also for users with limited programming skills.
Utilizing the workflow management system Snakemake and the package management system Conda, we have developed a modular, flexible and user-friendly RNA-Seq analysis workflow: RNA-Seq Analysis Snakemake Workflow (RASflow). Utilizing Snakemake and Conda alleviates challenges with library dependencies and version conflicts and also supports reproducibility. To be applicable for a wide variety of applications, RASflow supports the mapping of reads to both genomic and transcriptomic assemblies. RASflow has a broad range of potential users: it can be applied by researchers interested in any organism and since it requires no programming skills, it can be used by researchers with different backgrounds. The source code of RASflow is available on GitHub: https://github.com/zhxiaokang/RASflow.
RASflow is a simple and reliable RNA-Seq analysis workflow covering many use cases.
随着 DNA 测序成本的降低,越来越多的 RNA-Seq 数据被生成,为基因表达和调控提供了新的见解。在分析基因表达之前,必须对 RNA-Seq 数据进行一系列处理,从而对每个分析样本中的每个基因/转录本的表达进行量化。有许多工作流程可帮助研究人员对自己的数据或公共数据执行这些步骤,以利用数据重新分析中的新型软件或参考数据。然而,许多现有的工作流程仅限于特定类型的研究。因此,我们旨在开发一种最大限度通用的工作流程,适用于广泛的数据和分析方法,同时支持对模型和非模型生物的研究。此外,我们旨在使该工作流程也可供具有有限编程技能的用户使用。
我们利用工作流管理系统 Snakemake 和包管理系统 Conda,开发了一个模块化、灵活且用户友好的 RNA-Seq 分析工作流程:RNA-Seq 分析 Snakemake 工作流程 (RASflow)。利用 Snakemake 和 Conda,可以缓解库依赖和版本冲突带来的挑战,同时支持可重复性。为了适用于各种应用,RASflow 支持将读取映射到基因组和转录组组装。RASflow 拥有广泛的潜在用户:它可以应用于对任何生物体感兴趣的研究人员,并且由于它不需要编程技能,因此可以供具有不同背景的研究人员使用。RASflow 的源代码可在 GitHub 上获得:https://github.com/zhxiaokang/RASflow。
RASflow 是一种简单可靠的 RNA-Seq 分析工作流程,涵盖了许多用例。