Suppr超能文献

生命科学中工作流自动化组合的观点。

Perspectives on automated composition of workflows in the life sciences.

机构信息

Utrecht University, 3584 CS Utrecht, The Netherlands.

Leiden University Medical Center, 2333 ZA, Leiden, The Netherlands.

出版信息

F1000Res. 2021 Sep 7;10:897. doi: 10.12688/f1000research.54159.1. eCollection 2021.

Abstract

Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the "big picture" of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.

摘要

科学数据分析通常在自动化管道或工作流中结合使用多个计算工具。尽管由于缺乏注释、组装和实施的标准,这些工作流的组合仍然是一个繁琐的手动过程,但数千种这样的工作流已经在生命科学中得到了应用。最近的技术进步使自动化工作流组合的长期愿景重新成为焦点。本文总结了最近在洛伦兹中心举办的一次专门讨论生命科学中自动化工作流组合的研讨会。我们调查了以前自动化组合过程的举措,并讨论了当前的技术水平和未来的展望。我们首先绘制了科学工作流开发生命周期的“全貌”,然后调查和讨论了当前用于语义领域建模、工作流开发中的自动化以及工作流评估的方法、技术和实践。最后,我们得出了个人和社区的行动路线图,以努力实现未来几年自动化工作流开发的愿景。研讨会的一个中心成果是对工作流生命周期的一般描述,分为六个阶段:1)科学问题或假设,2)概念工作流,3)抽象工作流,4)具体工作流,5)生产工作流,6)科学结果。阶段之间的转换由各种工具和方法来促进,这些方法通常以某种形式结合了领域知识。形式语义领域建模是困难的,通常是语义技术应用的瓶颈。然而,生命科学社区近年来在这方面取得了相当大的进展,并不断改进,重新引起了对语义技术在工作流探索、组合和实例化中的应用的兴趣。结合参考数据的系统基准测试和生产阶段工作流的大规模部署,这些技术使工作流开发的过程比我们目前所知道的更加系统。我们相信,这将导致未来更健壮、可重复使用和可持续的工作流。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9f0/8573700/627add44b0d1/f1000research-10-57615-g0000.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验