Suppr超能文献

从桌面到网格:通过工作流转换实现可扩展的生物信息学

From the desktop to the grid: scalable bioinformatics via workflow conversion.

作者信息

de la Garza Luis, Veit Johannes, Szolek Andras, Röttig Marc, Aiche Stephan, Gesing Sandra, Reinert Knut, Kohlbacher Oliver

机构信息

Center for Bioinformatics and Dept. of Computer Science, University of Tübingen, Sand 14, Tübingen, 72070, Germany.

Algorithmic Bioinformatics, Computer Science Institute, Freie Universität Berlin, Takustr. 9, Berlin, 14195, Germany.

出版信息

BMC Bioinformatics. 2016 Mar 12;17:127. doi: 10.1186/s12859-016-0978-9.

Abstract

BACKGROUND

Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. Workflows rest upon the notion of splitting complex work into the joint effort of several manageable tasks. There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free -an aspect that could potentially drive away members of the scientific community.

RESULTS

We have developed a set of tools that enables the scientific community to benefit from workflow interoperability. We developed a platform-free structured representation of parameters, inputs, outputs of command-line tools in so-called Common Tool Descriptor documents. We have also overcome the shortcomings and combined the features of two royalty-free workflow engines with a substantial user community: the Konstanz Information Miner, an engine which we see as a formidable workflow editor, and the Grid and User Support Environment, a web-based framework able to interact with several high-performance computing resources. We have thus created a free and highly accessible way to design workflows on a desktop computer and execute them on high-performance computing resources.

CONCLUSIONS

Our work will not only reduce time spent on designing scientific workflows, but also make executing workflows on remote high-performance computing resources more accessible to technically inexperienced users. We strongly believe that our efforts not only decrease the turnaround time to obtain scientific results but also have a positive impact on reproducibility, thus elevating the quality of obtained scientific results.

摘要

背景

可重复性是科学方法的基本原则之一。科学实验通常包含复杂的数据流、适当参数的选择以及中间结果和最终结果的分析与可视化。将此类实验的复杂性分解为小型、可重复、定义明确的任务的联合协作,每个任务都有明确的输入、参数和输出,具有诸多直接益处,比如能够识别瓶颈、找出可从并行化中受益的部分等。工作流基于将复杂工作拆分为多个可管理任务的联合努力这一理念。有多种引擎可供用户设计和执行工作流。每个引擎都是为解决特定社区的某些问题而创建的,因此各有优缺点。此外,并非所有工作流引擎的所有功能都是免版税的——这一点可能会潜在地使科学界成员望而却步。

结果

我们开发了一套工具,使科学界能够从工作流互操作性中受益。我们在所谓的通用工具描述符文档中开发了一种与平台无关的命令行工具参数、输入和输出的结构化表示。我们还克服了缺点,并结合了两个拥有大量用户群体的免版税工作流引擎的功能:康斯坦茨信息挖掘器,我们将其视为一个强大的工作流编辑器;以及网格与用户支持环境,一个能够与多个高性能计算资源交互的基于网络的框架。因此,我们创建了一种免费且易于使用的方式,可在台式计算机上设计工作流,并在高性能计算资源上执行它们。

结论

我们的工作不仅会减少设计科学工作流所花费的时间,还会使技术经验不足的用户更易于在远程高性能计算资源上执行工作流。我们坚信,我们的努力不仅会缩短获得科学结果的周转时间,还会对可重复性产生积极影响,从而提高所获得科学结果的质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/a537b01c9ee4/12859_2016_978_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验