使用OpenMOLE工作流管理系统进行可重复的大规模神经成像研究。

Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System.

作者信息

Passerat-Palmbach Jonathan, Reuillon Romain, Leclaire Mathieu, Makropoulos Antonios, Robinson Emma C, Parisot Sarah, Rueckert Daniel

机构信息

BioMedIA Group, Department of Computing, Imperial College London London, UK.

Institut des Systemes Complexes Paris Ile de France Paris, France.

出版信息

Front Neuroinform. 2017 Mar 22;11:21. doi: 10.3389/fninf.2017.00021. eCollection 2017.

DOI:10.3389/fninf.2017.00021

PMID:28381997

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5361107/

Abstract

OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A demonstration of the flexibility of our solution showcases three neuroimaging pipelines harnessing distributed computing environments as heterogeneous as local clusters or the European Grid Infrastructure (EGI).

摘要

OpenMOLE是一个科学工作流引擎，特别强调工作负载分布。工作流使用基于Scala构建的高级领域特定语言（DSL）进行设计。它暴露了自然的并行结构，以便轻松地将工作流产生的工作负载委托给各种分布式计算环境。由于其DSL，OpenMOLE隐藏了设计复杂实验的复杂性。用户可以嵌入自己的应用程序，并将其管道从在桌面计算机上运行的小型原型扩展到利用分布式计算基础设施的大规模研究，只需在管道定义中更改一行即可。管道本身的构建与执行上下文解耦。与基于经典 shell 脚本的管道相反，高级DSL抽象了底层执行环境。这两个方面允许管道被共享，并且研究可以在不同的计算环境中被复制。工作流可以作为传统的批处理管道运行，或者与OpenMOLE的高级探索方法相结合，以研究应用程序的行为，或执行自动参数调整。在这项工作中，我们简要介绍了OpenMOLE的强大优势，并详细介绍了针对跨各种Linux平台的工作流重新执行能力的最新改进。我们已将OpenMOLE与CARE紧密结合，CARE是一种独立的容器化解决方案，可在Linux主机上重新执行之前在另一台Linux主机上打包的任何应用程序。该解决方案针对一个基于Python的管道进行了评估，该管道涉及诸如scikit-learn等包以及二进制依赖项。所有这些都在各种HPC环境中成功打包并重新执行，在每个环境中获得了相同的数值结果（此处为预测分数）。我们的结果表明，由OpenMOLE和CARE组成的组合是生成可重现结果和可重新执行管道的可靠解决方案。我们解决方案灵活性的演示展示了三个利用如本地集群或欧洲网格基础设施（EGI）等异构分布式计算环境的神经成像管道。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09ff/5361107/c903fcb545ac/fninf-11-00021-g0001.jpg

相似文献

Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System.使用OpenMOLE工作流管理系统进行可重复的大规模神经成像研究。

Front Neuroinform. 2017 Mar 22;11:21. doi: 10.3389/fninf.2017.00021. eCollection 2017.

DolphinNext: a distributed data processing platform for high throughput genomics.海豚下一代：一个用于高通量基因组学的分布式数据处理平台。

BMC Genomics. 2020 Apr 19;21(1):310. doi: 10.1186/s12864-020-6714-x.

Client Applications and Server-Side Docker for Management of RNASeq and/or VariantSeq Workflows and Pipelines of the GPRO Suite.客户端应用程序和服务器端 Docker 用于管理 GPRO 套件的 RNASeq 和/或 VariantSeq 工作流程和管道。

Genes (Basel). 2023 Jan 19;14(2):267. doi: 10.3390/genes14020267.

A midas plugin to enable construction of reproducible web-based image processing pipelines.一个可使基于网络的图像处理流水线的可重复性构建成为可能的 Midas 插件。

Front Neuroinform. 2013 Dec 30;7:46. doi: 10.3389/fninf.2013.00046. eCollection 2013.

The pipeline system for Octave and Matlab (PSOM): a lightweight scripting framework and execution engine for scientific workflows.Octave 和 Matlab 的管道系统 (PSOM)：用于科学工作流程的轻量级脚本框架和执行引擎。

Front Neuroinform. 2012 Apr 3;6:7. doi: 10.3389/fninf.2012.00007. eCollection 2012.

Efficient, Distributed and Interactive Neuroimaging Data Analysis Using the LONI Pipeline.使用 LONI 管道进行高效、分布式和交互式神经影像学数据分析。

Front Neuroinform. 2009 Jul 20;3:22. doi: 10.3389/neuro.11.022.2009. eCollection 2009.

Yabi: An online research environment for grid, high performance and cloud computing.Yabi：一个用于网格、高性能和云计算的在线研究环境。

Source Code Biol Med. 2012 Feb 15;7(1):1. doi: 10.1186/1751-0473-7-1.

Integrating the BIDS Neuroimaging Data Format and Workflow Optimization for Large-Scale Medical Image Analysis.将 BIDS 神经影像学数据格式与工作流程优化相结合，以进行大规模医学图像分析。

J Digit Imaging. 2022 Dec;35(6):1576-1589. doi: 10.1007/s10278-022-00679-8. Epub 2022 Aug 3.

A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines.一个轻量级、基于流的工具包，用于并行和分布式生物信息学管道。

BMC Bioinformatics. 2011 Feb 25;12:61. doi: 10.1186/1471-2105-12-61.

Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support.Tavaxy：集成 Taverna 和 Galaxy 工作流并提供云计算支持。

BMC Bioinformatics. 2012 May 4;13:77. doi: 10.1186/1471-2105-13-77.

本文引用的文献

Beyond Corroboration: Strengthening Model Validation by Looking for Unexpected Patterns.超越确证：通过寻找意外模式强化模型验证

PLoS One. 2015 Sep 14;10(9):e0138212. doi: 10.1371/journal.pone.0138212. eCollection 2015.

Tractography-Driven Groupwise Multi-scale Parcellation of the Cortex.基于纤维束成像的皮质分组多尺度分割

Inf Process Med Imaging. 2015;24:600-12. doi: 10.1007/978-3-319-19992-4_47.

Automatic whole brain MRI segmentation of the developing neonatal brain.自动全脑 MRI 新生儿脑发育的分段。

IEEE Trans Med Imaging. 2014 Sep;33(9):1818-31. doi: 10.1109/TMI.2014.2322280. Epub 2014 May 6.

Multimodal surface matching: fast and generalisable cortical registration using discrete optimisation.多模态表面匹配：使用离散优化的快速且通用的皮质配准

Inf Process Med Imaging. 2013;23:475-86. doi: 10.1007/978-3-642-38868-2_40.

Machine learning for neuroimaging with scikit-learn.使用 scikit-learn 进行神经影像学的机器学习。

Front Neuroinform. 2014 Feb 21;8:14. doi: 10.3389/fninf.2014.00014. eCollection 2014.

Automated processing of zebrafish imaging data: a survey.自动化处理斑马鱼成像数据：调查。

Zebrafish. 2013 Sep;10(3):401-21. doi: 10.1089/zeb.2013.0886. Epub 2013 Jun 12.

Accelerating fibre orientation estimation from diffusion weighted magnetic resonance imaging using GPUs.利用 GPU 加速从弥散加权磁共振成像中估计纤维方向。

PLoS One. 2013 Apr 29;8(4):e61892. doi: 10.1371/journal.pone.0061892. Print 2013.

Front Neuroinform. 2012 Apr 3;6:7. doi: 10.3389/fninf.2012.00007. eCollection 2012.

Model-based analysis of multishell diffusion MR data for tractography: how to get over fitting problems.基于模型的多壳弥散磁共振数据的分析用于束流追踪：如何克服过拟合问题。

Magn Reson Med. 2012 Dec;68(6):1846-55. doi: 10.1002/mrm.24204. Epub 2012 Feb 14.

Reproducible research in computational science.计算科学中的可重复性研究。

Science. 2011 Dec 2;334(6060):1226-7. doi: 10.1126/science.1213847.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用OpenMOLE工作流管理系统进行可重复的大规模神经成像研究。

Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献