Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala, Sweden.
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.
Bioinformatics. 2019 Oct 1;35(19):3752-3760. doi: 10.1093/bioinformatics/btz160.
Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator.
We developed a Virtual Research Environment (VRE) which facilitates rapid integration of new tools and developing scalable and interoperable workflows for performing metabolomics data analysis. The environment can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry, one nuclear magnetic resonance spectroscopy and one fluxomics study. We showed that the method scales dynamically with increasing availability of computational resources. We demonstrated that the method facilitates interoperability using integration of the major software suites resulting in a turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, statistics and identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science.
The PhenoMeNal consortium maintains a web portal (https://portal.phenomenal-h2020.eu) providing a GUI for launching the Virtual Research Environment. The GitHub repository https://github.com/phnmnl/ hosts the source code of all projects.
Supplementary data are available at Bioinformatics online.
开发一个强大且高效的数据分析工作流程,该流程集成了所有必要的组件,同时仍能够在多个计算节点上扩展,这是一项具有挑战性的任务。我们引入了一种基于微服务架构的通用方法,其中软件工具被封装为 Docker 容器,可以连接到科学工作流程中,并使用 Kubernetes 容器编排器执行。
我们开发了一个虚拟研究环境(VRE),它促进了新工具的快速集成,并为执行代谢组学数据分析开发了可扩展和互操作的工作流程。该环境可以按需在云资源和桌面计算机上启动。用户端的 IT 专业知识要求降至最低,并且任何新手用户都可以轻松地重复使用工作流程。我们在代谢组学领域的两项质谱、一项核磁共振波谱学和一项通量组学研究中验证了我们的方法。我们表明,该方法可以随着计算资源可用性的增加而动态扩展。我们证明,该方法通过整合主要软件套件来促进互操作性,从而形成一个包含基于质谱的代谢组学的所有步骤(包括预处理、统计和鉴定)的即用型工作流程。微服务是一种通用方法,可以为任何科学学科服务,并为新型大规模综合科学开辟道路。
PhenoMeNal 联盟维护一个门户网站(https://portal.phenomenal-h2020.eu),提供用于启动虚拟研究环境的 GUI。GitHub 存储库 https://github.com/phnmnl/ 托管所有项目的源代码。
补充数据可在 Bioinformatics 在线获得。