EPCC, The University of Edinburgh, Edinburgh, United Kingdom.
Institute for Cell Biology and SynthSys, School of Biological Sciences, The University of Edinburgh, Edinburgh, United Kingdom.
PLoS Comput Biol. 2021 Feb 25;17(2):e1008622. doi: 10.1371/journal.pcbi.1008622. eCollection 2021 Feb.
Workflow management systems represent, manage, and execute multistep computational analyses and offer many benefits to bioinformaticians. They provide a common language for describing analysis workflows, contributing to reproducibility and to building libraries of reusable components. They can support both incremental build and re-entrancy-the ability to selectively re-execute parts of a workflow in the presence of additional inputs or changes in configuration and to resume execution from where a workflow previously stopped. Many workflow management systems enhance portability by supporting the use of containers, high-performance computing (HPC) systems, and clouds. Most importantly, workflow management systems allow bioinformaticians to delegate how their workflows are run to the workflow management system and its developers. This frees the bioinformaticians to focus on what these workflows should do, on their data analyses, and on their science. RiboViz is a package to extract biological insight from ribosome profiling data to help advance understanding of protein synthesis. At the heart of RiboViz is an analysis workflow, implemented in a Python script. To conform to best practices for scientific computing which recommend the use of build tools to automate workflows and to reuse code instead of rewriting it, the authors reimplemented this workflow within a workflow management system. To select a workflow management system, a rapid survey of available systems was undertaken, and candidates were shortlisted: Snakemake, cwltool, Toil, and Nextflow. Each candidate was evaluated by quickly prototyping a subset of the RiboViz workflow, and Nextflow was chosen. The selection process took 10 person-days, a small cost for the assurance that Nextflow satisfied the authors' requirements. The use of prototyping can offer a low-cost way of making a more informed selection of software to use within projects, rather than relying solely upon reviews and recommendations by others.
工作流管理系统代表、管理和执行多步骤计算分析,并为生物信息学家带来许多好处。它们为描述分析工作流程提供了一种通用语言,有助于可重复性和构建可重用组件库。它们可以支持增量构建和可重入性——即在存在附加输入或配置更改的情况下选择性地重新执行工作流程的部分,并从工作流程之前停止的位置继续执行的能力。许多工作流管理系统通过支持使用容器、高性能计算 (HPC) 系统和云来增强可移植性。最重要的是,工作流管理系统允许生物信息学家将其工作流程的运行方式委托给工作流管理系统及其开发人员。这使生物信息学家可以专注于工作流程应该做什么、他们的数据分析以及他们的科学研究。RiboViz 是一个从核糖体谱数据中提取生物学见解的软件包,旨在帮助深入了解蛋白质合成。RiboViz 的核心是一个在 Python 脚本中实现的分析工作流程。为了符合推荐使用构建工具来自动化工作流程和重用代码而不是重写代码的科学计算最佳实践,作者在工作流管理系统中重新实现了这个工作流程。为了选择一个工作流管理系统,对可用系统进行了快速调查,并筛选出了候选系统:Snakemake、cwltool、Toil 和 Nextflow。每个候选系统都通过快速原型设计 RiboViz 工作流程的一个子集进行了评估,最终选择了 Nextflow。选择过程耗时 10 个人天,这是一个很小的成本,可以确保 Nextflow 满足作者的要求。使用原型设计可以提供一种低成本的方法,以便在项目中更明智地选择要使用的软件,而不是仅仅依赖于他人的评论和推荐。