Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA.
BMC Bioinformatics. 2010 Nov 2;11:542. doi: 10.1186/1471-2105-11-542.
In biological and medical domain, the use of web services made the data and computation functionality accessible in a unified manner, which helped automate the data pipeline that was previously performed manually. Workflow technology is widely used in the orchestration of multiple services to facilitate in-silico research. Cancer Biomedical Informatics Grid (caBIG) is an information network enabling the sharing of cancer research related resources and caGrid is its underlying service-based computation infrastructure. CaBIG requires that services are composed and orchestrated in a given sequence to realize data pipelines, which are often called scientific workflows.
CaGrid selected Taverna as its workflow execution system of choice due to its integration with web service technology and support for a wide range of web services, plug-in architecture to cater for easy integration of third party extensions, etc. The caGrid Workflow Toolkit (or the toolkit for short), an extension to the Taverna workflow system, is designed and implemented to ease building and running caGrid workflows. It provides users with support for various phases in using workflows: service discovery, composition and orchestration, data access, and secure service invocation, which have been identified by the caGrid community as challenging in a multi-institutional and cross-discipline domain.
By extending the Taverna Workbench, caGrid Workflow Toolkit provided a comprehensive solution to compose and coordinate services in caGrid, which would otherwise remain isolated and disconnected from each other. Using it users can access more than 140 services and are offered with a rich set of features including discovery of data and analytical services, query and transfer of data, security protections for service invocations, state management in service interactions, and sharing of workflows, experiences and best practices. The proposed solution is general enough to be applicable and reusable within other service-computing infrastructures that leverage similar technology stack.
在生物和医学领域,网络服务的使用以统一的方式实现了数据和计算功能的可访问性,这有助于自动化以前手动执行的数据管道。工作流技术广泛用于多个服务的协调,以促进计算机研究。癌症生物医学信息网格(caBIG)是一个启用癌症研究相关资源共享的信息网络,而 caGrid 是其基于服务的计算基础架构。CaBIG 要求以给定的顺序组合和协调服务,以实现数据管道,这些数据管道通常被称为科学工作流。
CaGrid 选择 Taverna 作为其工作流执行系统,因为它集成了 Web 服务技术并支持广泛的 Web 服务、插件架构,可轻松集成第三方扩展等。CaGrid 工作流工具包(简称工具包)是 Taverna 工作流系统的扩展,旨在简化构建和运行 caGrid 工作流。它为用户提供了在使用工作流的各个阶段的支持:服务发现、组合和协调、数据访问以及安全的服务调用,这些都是 caGrid 社区在多机构和跨学科领域中认为具有挑战性的。
通过扩展 Taverna Workbench,caGrid Workflow Toolkit 提供了在 caGrid 中组合和协调服务的综合解决方案,否则这些服务将彼此孤立和断开连接。使用它,用户可以访问 140 多个服务,并提供了丰富的功能,包括数据和分析服务的发现、数据的查询和传输、服务调用的安全保护、服务交互中的状态管理以及工作流、经验和最佳实践的共享。所提出的解决方案足够通用,可以在利用类似技术堆栈的其他服务计算基础架构中应用和重用。