模拟工作流程的元数据实践。

Metadata practices for simulation workflows.

作者信息

Villamar José, Kelbling Matthias, More Heather L, Denker Michael, Tetzlaff Tom, Senk Johanna, Thober Stephan

机构信息

Institute for Advanced Simulation (IAS-6), Jülich Research Centre, Jülich, Germany.

RWTH Aachen University, Aachen, Germany.

出版信息

Sci Data. 2025 Jun 5;12(1):942. doi: 10.1038/s41597-025-05126-1.

DOI:10.1038/s41597-025-05126-1

PMID:40473681

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12141434/

Abstract

Computer simulations are an essential pillar of knowledge generation in science. Exploring, understanding, reproducing, and sharing the results of simulations relies on tracking and organizing the metadata describing the numerical experiments. The models used to understand real-world systems, and the computational machinery required to simulate them, are typically complex, and produce large amounts of heterogeneous metadata. Here, we present general practices for acquiring and handling metadata that are agnostic to software and hardware, and highly flexible for the user. These consist of two steps: 1) recording and storing raw metadata, and 2) selecting and structuring metadata. As a proof of concept, we develop the Archivist, a Python tool to help with the second step, and use it to apply our practices to distinct high-performance computing use cases from neuroscience and hydrology. Our practices and the Archivist can readily be applied to existing workflows without the need for substantial restructuring. They support sustainable numerical workflows, fostering replicability, reproducibility, data exploration, and data sharing in simulation-based research.

摘要

计算机模拟是科学知识生成的重要支柱。探索、理解、重现和共享模拟结果依赖于跟踪和组织描述数值实验的元数据。用于理解现实世界系统的模型以及模拟这些系统所需的计算机制通常很复杂，并且会产生大量异构元数据。在这里，我们提出了获取和处理元数据的通用方法，这些方法与软件和硬件无关，并且对用户具有高度的灵活性。这些方法包括两个步骤：1）记录和存储原始元数据，2）选择和构建元数据。作为概念验证，我们开发了Archivist，这是一个Python工具，用于帮助完成第二步，并使用它将我们的方法应用于神经科学和水文学中不同的高性能计算用例。我们的方法和Archivist可以很容易地应用于现有工作流程，而无需进行大量重组。它们支持可持续的数值工作流程，促进基于模拟的研究中的可重复性、可再现性、数据探索和数据共享。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1214/12141434/091e4605de6a/41597_2025_5126_Fig1_HTML.jpg

相似文献

Metadata practices for simulation workflows.

Sci Data. 2025 Jun 5;12(1):942. doi: 10.1038/s41597-025-05126-1.

Workflow sharing with automated metadata validation and test execution to improve the reusability of published workflows.

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad006. Epub 2023 Feb 22.

Integrated workflows for spiking neuronal network simulations.

Front Neuroinform. 2013 Dec 10;7:34. doi: 10.3389/fninf.2013.00034. eCollection 2013.

Exploring Parameter and Hyper-Parameter Spaces of Neuroscience Models on High Performance Computers With Learning to Learn.

Front Comput Neurosci. 2022 May 27;16:885207. doi: 10.3389/fncom.2022.885207. eCollection 2022.

Ibaqpy: A scalable Python package for baseline quantification in proteomics leveraging SDRF metadata.

J Proteomics. 2025 Jun 15;317:105440. doi: 10.1016/j.jprot.2025.105440. Epub 2025 Apr 21.

Microbench: automated metadata management for systems biology benchmarking and reproducibility in Python.

Bioinformatics. 2022 Oct 14;38(20):4823-4825. doi: 10.1093/bioinformatics/btac580.

odML-Tables as a Metadata Standard in Microneurography.

Stud Health Technol Inform. 2023 Sep 12;307:3-11. doi: 10.3233/SHTI230687.

The role of metadata in reproducible computational research.

Patterns (N Y). 2021 Sep 10;2(9):100322. doi: 10.1016/j.patter.2021.100322.

Handling Metadata in a Neurophysiology Laboratory.

Front Neuroinform. 2016 Jul 19;10:26. doi: 10.3389/fninf.2016.00026. eCollection 2016.

Sharing and organizing research products as R packages.

Behav Res Methods. 2021 Apr;53(2):792-802. doi: 10.3758/s13428-020-01436-x.

本文引用的文献

DataLad: distributed system for joint management of code, data, and their relationship.

J Open Source Softw. 2021;6(63). doi: 10.21105/joss.03262. Epub 2021 Jul 1.

neuroAIx-Framework: design of future neuroscience simulation systems exhibiting execution of the cortical microcircuit model 20× faster than biological real-time.

Front Comput Neurosci. 2023 Apr 20;17:1144143. doi: 10.3389/fncom.2023.1144143. eCollection 2023.

A Modular Workflow for Performance Benchmarking of Neuronal Network Simulations.

Front Neuroinform. 2022 May 11;16:837549. doi: 10.3389/fninf.2022.837549. eCollection 2022.

Routing Brain Traffic Through the Von Neumann Bottleneck: Parallel Sorting and Refactoring.

Front Neuroinform. 2022 Mar 1;15:785068. doi: 10.3389/fninf.2021.785068. eCollection 2021.

Simulating the Cortical Microcircuit Significantly Faster Than Real Time on the IBM INC-3000 Neural Supercomputer.

Front Neurosci. 2022 Jan 20;15:728460. doi: 10.3389/fnins.2021.728460. eCollection 2021.

Dynamical Characteristics of Recurrent Neuronal Networks Are Robust Against Low Synaptic Weight Resolution.

Front Neurosci. 2021 Dec 24;15:757790. doi: 10.3389/fnins.2021.757790. eCollection 2021.

The role of metadata in reproducible computational research.

Patterns (N Y). 2021 Sep 10;2(9):100322. doi: 10.1016/j.patter.2021.100322.

Sustainable data analysis with Snakemake.

F1000Res. 2021 Jan 18;10:33. doi: 10.12688/f1000research.29032.2. eCollection 2021.

PyGeNN: A Python Library for GPU-Enhanced Neural Networks.

Front Neuroinform. 2021 Apr 22;15:659005. doi: 10.3389/fninf.2021.659005. eCollection 2021.

Fast Simulations of Highly-Connected Spiking Cortical Models Using GPUs.

Front Comput Neurosci. 2021 Feb 17;15:627620. doi: 10.3389/fncom.2021.627620. eCollection 2021.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

模拟工作流程的元数据实践。

Metadata practices for simulation workflows.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献