• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从桌面到网格:通过工作流转换实现可扩展的生物信息学

From the desktop to the grid: scalable bioinformatics via workflow conversion.

作者信息

de la Garza Luis, Veit Johannes, Szolek Andras, Röttig Marc, Aiche Stephan, Gesing Sandra, Reinert Knut, Kohlbacher Oliver

机构信息

Center for Bioinformatics and Dept. of Computer Science, University of Tübingen, Sand 14, Tübingen, 72070, Germany.

Algorithmic Bioinformatics, Computer Science Institute, Freie Universität Berlin, Takustr. 9, Berlin, 14195, Germany.

出版信息

BMC Bioinformatics. 2016 Mar 12;17:127. doi: 10.1186/s12859-016-0978-9.

DOI:10.1186/s12859-016-0978-9
PMID:26968893
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4788856/
Abstract

BACKGROUND

Reproducibility is one of the tenets of the scientific method. Scientific experiments often comprise complex data flows, selection of adequate parameters, and analysis and visualization of intermediate and end results. Breaking down the complexity of such experiments into the joint collaboration of small, repeatable, well defined tasks, each with well defined inputs, parameters, and outputs, offers the immediate benefit of identifying bottlenecks, pinpoint sections which could benefit from parallelization, among others. Workflows rest upon the notion of splitting complex work into the joint effort of several manageable tasks. There are several engines that give users the ability to design and execute workflows. Each engine was created to address certain problems of a specific community, therefore each one has its advantages and shortcomings. Furthermore, not all features of all workflow engines are royalty-free -an aspect that could potentially drive away members of the scientific community.

RESULTS

We have developed a set of tools that enables the scientific community to benefit from workflow interoperability. We developed a platform-free structured representation of parameters, inputs, outputs of command-line tools in so-called Common Tool Descriptor documents. We have also overcome the shortcomings and combined the features of two royalty-free workflow engines with a substantial user community: the Konstanz Information Miner, an engine which we see as a formidable workflow editor, and the Grid and User Support Environment, a web-based framework able to interact with several high-performance computing resources. We have thus created a free and highly accessible way to design workflows on a desktop computer and execute them on high-performance computing resources.

CONCLUSIONS

Our work will not only reduce time spent on designing scientific workflows, but also make executing workflows on remote high-performance computing resources more accessible to technically inexperienced users. We strongly believe that our efforts not only decrease the turnaround time to obtain scientific results but also have a positive impact on reproducibility, thus elevating the quality of obtained scientific results.

摘要

背景

可重复性是科学方法的基本原则之一。科学实验通常包含复杂的数据流、适当参数的选择以及中间结果和最终结果的分析与可视化。将此类实验的复杂性分解为小型、可重复、定义明确的任务的联合协作,每个任务都有明确的输入、参数和输出,具有诸多直接益处,比如能够识别瓶颈、找出可从并行化中受益的部分等。工作流基于将复杂工作拆分为多个可管理任务的联合努力这一理念。有多种引擎可供用户设计和执行工作流。每个引擎都是为解决特定社区的某些问题而创建的,因此各有优缺点。此外,并非所有工作流引擎的所有功能都是免版税的——这一点可能会潜在地使科学界成员望而却步。

结果

我们开发了一套工具,使科学界能够从工作流互操作性中受益。我们在所谓的通用工具描述符文档中开发了一种与平台无关的命令行工具参数、输入和输出的结构化表示。我们还克服了缺点,并结合了两个拥有大量用户群体的免版税工作流引擎的功能:康斯坦茨信息挖掘器,我们将其视为一个强大的工作流编辑器;以及网格与用户支持环境,一个能够与多个高性能计算资源交互的基于网络的框架。因此,我们创建了一种免费且易于使用的方式,可在台式计算机上设计工作流,并在高性能计算资源上执行它们。

结论

我们的工作不仅会减少设计科学工作流所花费的时间,还会使技术经验不足的用户更易于在远程高性能计算资源上执行工作流。我们坚信,我们的努力不仅会缩短获得科学结果的周转时间,还会对可重复性产生积极影响,从而提高所获得科学结果的质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/773bbad86482/12859_2016_978_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/a537b01c9ee4/12859_2016_978_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/0018c926c800/12859_2016_978_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/9f4fdce18816/12859_2016_978_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/3e135e1cac73/12859_2016_978_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/cf658d29c869/12859_2016_978_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/f549e0e9a368/12859_2016_978_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/6d7781c4f1d2/12859_2016_978_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/0474d7f622fa/12859_2016_978_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/0f097e8dac9f/12859_2016_978_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/773bbad86482/12859_2016_978_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/a537b01c9ee4/12859_2016_978_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/0018c926c800/12859_2016_978_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/9f4fdce18816/12859_2016_978_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/3e135e1cac73/12859_2016_978_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/cf658d29c869/12859_2016_978_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/f549e0e9a368/12859_2016_978_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/6d7781c4f1d2/12859_2016_978_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/0474d7f622fa/12859_2016_978_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/0f097e8dac9f/12859_2016_978_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d898/4788856/773bbad86482/12859_2016_978_Fig10_HTML.jpg

相似文献

1
From the desktop to the grid: scalable bioinformatics via workflow conversion.从桌面到网格:通过工作流转换实现可扩展的生物信息学
BMC Bioinformatics. 2016 Mar 12;17:127. doi: 10.1186/s12859-016-0978-9.
2
Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support.Tavaxy:集成 Taverna 和 Galaxy 工作流并提供云计算支持。
BMC Bioinformatics. 2012 May 4;13:77. doi: 10.1186/1471-2105-13-77.
3
Support for Taverna workflows in the VPH-Share cloud platform.在 VPH-Share 云平台中支持 Taverna 工作流。
Comput Methods Programs Biomed. 2017 Jul;146:37-46. doi: 10.1016/j.cmpb.2017.05.006. Epub 2017 May 20.
4
Simplifying the development of portable, scalable, and reproducible workflows.简化便携式、可扩展和可重复使用工作流程的开发。
Elife. 2021 Oct 13;10:e71069. doi: 10.7554/eLife.71069.
5
qPortal: A platform for data-driven biomedical research.qPortal:一个用于数据驱动型生物医学研究的平台。
PLoS One. 2018 Jan 19;13(1):e0191603. doi: 10.1371/journal.pone.0191603. eCollection 2018.
6
Biowep: a workflow enactment portal for bioinformatics applications.生物工作流引擎(Biowep):一个用于生物信息学应用的工作流制定门户。
BMC Bioinformatics. 2007 Mar 8;8 Suppl 1(Suppl 1):S19. doi: 10.1186/1471-2105-8-S1-S19.
7
High performance workflow implementation for protein surface characterization using grid technology.使用网格技术进行蛋白质表面表征的高性能工作流程实现
BMC Bioinformatics. 2005 Dec 1;6 Suppl 4(Suppl 4):S19. doi: 10.1186/1471-2105-6-S4-S19.
8
JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing.JMS:一个用于高性能计算的开源工作流管理系统和基于网络的集群前端。
PLoS One. 2015 Aug 17;10(8):e0134273. doi: 10.1371/journal.pone.0134273. eCollection 2015.
9
The Dockstore: enhancing a community platform for sharing reproducible and accessible computational protocols.Dockstore:增强了一个用于共享可重复和可访问的计算协议的社区平台。
Nucleic Acids Res. 2021 Jul 2;49(W1):W624-W632. doi: 10.1093/nar/gkab346.
10
Workflows for microarray data processing in the Kepler environment.在 Kepler 环境中进行微阵列数据处理的工作流程。
BMC Bioinformatics. 2012 May 17;13:102. doi: 10.1186/1471-2105-13-102.

引用本文的文献

1
Selection of computational environments for PSP processing on scientific gateways.科学网关中用于PSP处理的计算环境选择。
Heliyon. 2018 Jul 17;4(7):e00690. doi: 10.1016/j.heliyon.2018.e00690. eCollection 2018 Jul.
2
Data Integration for Future Medicine (DIFUTURE).未来医学数据集成(DIFUTURE)
Methods Inf Med. 2018 Jul;57(S 01):e57-e65. doi: 10.3414/ME17-02-0022. Epub 2018 Jul 17.
3
Closha: bioinformatics workflow system for the analysis of massive sequencing data.Closha:用于大规模测序数据分析的生物信息学工作流系统。

本文引用的文献

1
Big data bioinformatics.大数据生物信息学。
J Cell Physiol. 2014 Dec;229(12):1896-900. doi: 10.1002/jcp.24662.
2
Reproducibility.可重复性
Science. 2014 Jan 17;343(6168):229. doi: 10.1126/science.1250475.
3
An automated pipeline for high-throughput label-free quantitative proteomics.一种用于高通量无标记定量蛋白质组学的自动化流程。
BMC Bioinformatics. 2018 Feb 19;19(Suppl 1):43. doi: 10.1186/s12859-018-2019-3.
4
Methods for enhancing the reproducibility of biomedical research findings using electronic health records.利用电子健康记录提高生物医学研究结果可重复性的方法。
BioData Min. 2017 Sep 11;10:31. doi: 10.1186/s13040-017-0151-7. eCollection 2017.
5
ImmunoNodes - graphical development of complex immunoinformatics workflows.免疫节点——复杂免疫信息学工作流程的图形化开发
BMC Bioinformatics. 2017 May 8;18(1):242. doi: 10.1186/s12859-017-1667-z.
6
Integration and Visualization of Translational Medicine Data for Better Understanding of Human Diseases.转化医学数据的整合与可视化,以更好地理解人类疾病。
Big Data. 2016 Jun;4(2):97-108. doi: 10.1089/big.2015.0057.
J Proteome Res. 2013 Apr 5;12(4):1628-44. doi: 10.1021/pr300992u. Epub 2013 Feb 22.
4
TOPPAS: a graphical workflow editor for the analysis of high-throughput proteomics data.TOPPAS:一个用于分析高通量蛋白质组学数据的图形化工作流编辑器。
J Proteome Res. 2012 Jul 6;11(7):3914-20. doi: 10.1021/pr300187f. Epub 2012 May 24.
5
Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support.Tavaxy:集成 Taverna 和 Galaxy 工作流并提供云计算支持。
BMC Bioinformatics. 2012 May 4;13:77. doi: 10.1186/1471-2105-13-77.
6
Probabilistic consensus scoring improves tandem mass spectrometry peptide identification.概率共识评分提高串联质谱肽鉴定。
J Proteome Res. 2011 Aug 5;10(8):3332-43. doi: 10.1021/pr2002879. Epub 2011 Jun 23.
7
BALL--biochemical algorithms library 1.3.BALL--生化算法库 1.3。
BMC Bioinformatics. 2010 Oct 25;11:531. doi: 10.1186/1471-2105-11-531.
8
myExperiment: a repository and social network for the sharing of bioinformatics workflows.myExperiment:一个用于生物信息学工作流程共享的存储库和社交网络。
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W677-82. doi: 10.1093/nar/gkq429. Epub 2010 May 25.
9
Galaxy: a web-based genome analysis tool for experimentalists.Galaxy:一款面向实验人员的基于网络的基因组分析工具。
Curr Protoc Mol Biol. 2010 Jan;Chapter 19:Unit 19.10.1-21. doi: 10.1002/0471142727.mb1910s89.
10
OpenMS - an open-source software framework for mass spectrometry.OpenMS——一个用于质谱分析的开源软件框架。
BMC Bioinformatics. 2008 Mar 26;9:163. doi: 10.1186/1471-2105-9-163.