Institute for Informatics, Ludwig-Maximilians-Universität München, Amalienstraße 17, München, 80333, Germany.
BMC Bioinformatics. 2018 Mar 13;19(1):97. doi: 10.1186/s12859-018-2107-4.
The development of high-throughput experimental technologies, such as next-generation sequencing, have led to new challenges for handling, analyzing and integrating the resulting large and diverse datasets. Bioinformatical analysis of these data commonly requires a number of mutually dependent steps applied to numerous samples for multiple conditions and replicates. To support these analyses, a number of workflow management systems (WMSs) have been developed to allow automated execution of corresponding analysis workflows. Major advantages of WMSs are the easy reproducibility of results as well as the reusability of workflows or their components.
In this article, we present Watchdog, a WMS for the automated analysis of large-scale experimental data. Main features include straightforward processing of replicate data, support for distributed computer systems, customizable error detection and manual intervention into workflow execution. Watchdog is implemented in Java and thus platform-independent and allows easy sharing of workflows and corresponding program modules. It provides a graphical user interface (GUI) for workflow construction using pre-defined modules as well as a helper script for creating new module definitions. Execution of workflows is possible using either the GUI or a command-line interface and a web-interface is provided for monitoring the execution status and intervening in case of errors. To illustrate its potentials on a real-life example, a comprehensive workflow and modules for the analysis of RNA-seq experiments were implemented and are provided with the software in addition to simple test examples.
Watchdog is a powerful and flexible WMS for the analysis of large-scale high-throughput experiments. We believe it will greatly benefit both users with and without programming skills who want to develop and apply bioinformatical workflows with reasonable overhead. The software, example workflows and a comprehensive documentation are freely available at www.bio.ifi.lmu.de/watchdog.
高通量实验技术的发展,如下一代测序,为处理、分析和整合由此产生的大量和多样化数据集带来了新的挑战。这些数据的生物信息学分析通常需要应用于许多样本的多个条件和重复的许多相互依赖的步骤。为了支持这些分析,已经开发了许多工作流管理系统(WMS),以允许自动执行相应的分析工作流。WMS 的主要优点是结果的易于重现性以及工作流或其组件的可重用性。
在本文中,我们介绍了 Watchdog,这是一种用于自动化分析大规模实验数据的 WMS。主要功能包括简单地处理重复数据、支持分布式计算机系统、可定制的错误检测以及对工作流执行的手动干预。Watchdog 是用 Java 实现的,因此与平台无关,并允许轻松共享工作流及其相应的程序模块。它提供了一个使用预定义模块构建工作流的图形用户界面(GUI),以及一个用于创建新模块定义的助手脚本。可以使用 GUI 或命令行界面执行工作流,并且提供了一个 Web 界面来监控执行状态并在发生错误时进行干预。为了在实际示例上说明其潜力,实现了一个综合的工作流和用于分析 RNA-seq 实验的模块,并随软件一起提供了简单的测试示例。
Watchdog 是一种用于大规模高通量实验分析的强大而灵活的 WMS。我们相信,它将极大地受益于有和没有编程技能的用户,他们希望开发和应用具有合理开销的生物信息学工作流。软件、示例工作流和全面的文档可在 www.bio.ifi.lmu.de/watchdog 上免费获得。