Villamar José, Kelbling Matthias, More Heather L, Denker Michael, Tetzlaff Tom, Senk Johanna, Thober Stephan
Institute for Advanced Simulation (IAS-6), Jülich Research Centre, Jülich, Germany.
RWTH Aachen University, Aachen, Germany.
Sci Data. 2025 Jun 5;12(1):942. doi: 10.1038/s41597-025-05126-1.
Computer simulations are an essential pillar of knowledge generation in science. Exploring, understanding, reproducing, and sharing the results of simulations relies on tracking and organizing the metadata describing the numerical experiments. The models used to understand real-world systems, and the computational machinery required to simulate them, are typically complex, and produce large amounts of heterogeneous metadata. Here, we present general practices for acquiring and handling metadata that are agnostic to software and hardware, and highly flexible for the user. These consist of two steps: 1) recording and storing raw metadata, and 2) selecting and structuring metadata. As a proof of concept, we develop the Archivist, a Python tool to help with the second step, and use it to apply our practices to distinct high-performance computing use cases from neuroscience and hydrology. Our practices and the Archivist can readily be applied to existing workflows without the need for substantial restructuring. They support sustainable numerical workflows, fostering replicability, reproducibility, data exploration, and data sharing in simulation-based research.
计算机模拟是科学知识生成的重要支柱。探索、理解、重现和共享模拟结果依赖于跟踪和组织描述数值实验的元数据。用于理解现实世界系统的模型以及模拟这些系统所需的计算机制通常很复杂,并且会产生大量异构元数据。在这里,我们提出了获取和处理元数据的通用方法,这些方法与软件和硬件无关,并且对用户具有高度的灵活性。这些方法包括两个步骤:1)记录和存储原始元数据,2)选择和构建元数据。作为概念验证,我们开发了Archivist,这是一个Python工具,用于帮助完成第二步,并使用它将我们的方法应用于神经科学和水文学中不同的高性能计算用例。我们的方法和Archivist可以很容易地应用于现有工作流程,而无需进行大量重组。它们支持可持续的数值工作流程,促进基于模拟的研究中的可重复性、可再现性、数据探索和数据共享。