Ortiz-Martínez Daniel
Department of Mathematics and Computer Science, Universitat de Barcelona, Gran Via de les Corts Catalanes, 585, 08007, Barcelona, Spain.
BMC Bioinformatics. 2025 Apr 16;26(1):106. doi: 10.1186/s12859-025-06108-1.
Bioinformatics data analysis faces significant challenges. As data analysis often takes the form of pipelines or workflows, workflow managers (WfMs) have become essential. Data flow programming constitutes the preferred approach in WfMs, enabling parallel processes activated reactively based on input availability. While this paradigm typically follows a linear, acyclic progression, cyclic workflows are sometimes necessary in bioinformatics analyses. These cyclic workflows also present an opportunity to explore workflow interactivity, a feature not widely implemented in existing WfMs.
We propose DeBasher, a tool that adopts the flow-based programming (FBP) paradigm, in which the workflow components are in control of their life cycle and can store state information, allowing the execution of complex workflows that include cycles. DeBasher also incorporates a powerful model of interactivity, where the user can alter the behavior of a running workflow. Additionally, DeBasher allows the user to define triggers so as to initiate the execution of a complete workflow or a part of it. The ability to execute processes with state and in control of their life cycle also has applications in dynamic scheduling tasks. Furthermore, DeBasher presents a series of extra features, including the combination of multiple workflows at runtime through a feature we have called runtime piping, switching to static scheduling to increase scalability, or implementing processes in multiple languages. DeBasher has been successfully used to process 131.7 TB of genomic data by means of a variant calling pipeline.
DeBasher is an FBP Bash extension that can be useful in a wide range of situations and in particular when implementing complex workflows, workflows with interactivity or triggers, or when a high scalability is required.
生物信息学数据分析面临重大挑战。由于数据分析通常采用管道或工作流的形式,工作流管理器(WfM)变得至关重要。数据流编程是WfM中的首选方法,它能使并行进程根据输入可用性被动激活。虽然这种范式通常遵循线性、无环的进程,但在生物信息学分析中有时需要循环工作流。这些循环工作流也为探索工作流交互性提供了机会,而这一特性在现有WfM中并未广泛实现。
我们提出了DeBasher,这是一种采用基于流的编程(FBP)范式的工具,其中工作流组件能够控制其生命周期并可存储状态信息,从而允许执行包括循环的复杂工作流。DeBasher还纳入了强大的交互模型,用户可以改变正在运行的工作流的行为。此外,DeBasher允许用户定义触发器,以便启动完整工作流或其一部分的执行。能够执行具有状态并控制其生命周期的进程在动态调度任务中也有应用。此外,DeBasher还具有一系列额外功能,包括通过我们称为运行时管道的功能在运行时组合多个工作流、切换到静态调度以提高可扩展性,或用多种语言实现进程。DeBasher已通过一个变异调用管道成功用于处理131.7TB的基因组数据。
DeBasher是一个FBP Bash扩展,在广泛的情况下都可能有用,特别是在实现复杂工作流、具有交互性或触发器的工作流,或需要高可扩展性时。