Kaushik Gaurav, Ivkovic Sinisa, Simonovic Janko, Tijanic Nebojsa, Davis-Dusenbery Brandi, Kural Deniz
Seven Bridges Genomics, 1 Main Street, Cambridge, MA 02140, USA*Corresponding author.,
Pac Symp Biocomput. 2017;22:154-165. doi: 10.1142/9789813207813_0016.
As biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses. One such specification is the Common Workflow Language, an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. In addition, reproducibility can be furthered by executors or workflow engines which interpret the specification and enable additional features, such as error logging, file organization, optim1izations to computation and job scheduling, and allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executor, an open-source workflow engine for the purposes of improving reproducibility through reusability and interoperability of workflow descriptions.
随着生物医学数据越来越容易大量生成,用于分析这些数据的方法也迅速激增。需要可重复和可重用的方法来可靠地从大量数据中学习。为了解决这个问题,许多团队已经开发了工作流规范或执行引擎,它们提供了一个执行一系列分析的框架。其中一种规范就是通用工作流语言(Common Workflow Language),这是一种新兴标准,为描述数据分析工具和工作流提供了一个强大且灵活的框架。此外,执行器或工作流引擎可以进一步提高可重复性,它们解释规范并启用其他功能,如错误记录、文件组织、计算优化和作业调度,还允许对大量数据进行轻松计算。为此,我们开发了Rabix执行器,这是一个开源工作流引擎,旨在通过工作流描述的可重用性和互操作性来提高可重复性。