Center for Informatics Sciences, Nile University, Giza, Egypt.
BMC Bioinformatics. 2012 May 4;13:77. doi: 10.1186/1471-2105-13-77.
Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts.
In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure.
Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis.The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org.
在过去的十年中,工作流系统范式已经发展成为开发复杂生物信息学应用程序的一种高效、用户友好的方法。两个得到生物信息学社区认可的流行工作流系统是 Taverna 和 Galaxy。每个系统都有大量的用户群,并支持不断增长的应用程序工作流程存储库。然而,为一个系统开发的工作流程不能轻松地导入并在另一个系统上执行。缺乏互操作性是由于两个系统的计算模型、工作流语言和体系结构的差异造成的。这种缺乏互操作性限制了用户社区之间的工作流程共享,并导致了开发工作的重复。
在本文中,我们提出了 Tavaxy,这是一个基于使用可扩展的可重用工作流模式集来创建和执行工作流的独立系统。Tavaxy 提供了一组简化和增强序列分析应用程序开发的新功能:它允许在单个环境中集成现有的 Taverna 和 Galaxy 工作流,并支持使用云计算功能。基于层次工作流和工作流模式的概念,Tavaxy 在运行时和设计时级别都支持无缝集成现有的 Taverna 和 Galaxy 工作流。在 Tavaxy 中使用云计算是灵活的,用户可以在云中实例化整个系统,也可以将某些子工作流的执行委托给云基础设施。
Tavaxy 通过引入工作流模式来简化工作流创建,从而缩短了工作流开发周期。它使现有的(子)工作流的重用和集成成为可能,并且允许创建混合工作流。其附加功能利用了高性能云计算的最新进展,以应对分析数据量和复杂性的不断增加。用户可以通过启用云的 web 界面访问该系统,也可以下载并安装到用户的本地环境中运行。与 Tavaxy 相关的所有资源都可在 http://www.tavaxy.org 上获得。