BBMRI-ERIC, Neue Stiftingtalstrasse 2, 8010, Graz, Austria.
Faculty of Informatics, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic.
Sci Data. 2022 Aug 17;9(1):503. doi: 10.1038/s41597-022-01537-6.
Provenance is information describing the lineage of an object, such as a dataset or biological material. Since these objects can be passed between organizations, each organization can document only parts of the objects life cycle. As a result, interconnection of distributed provenance parts forms distributed provenance chains. Dependant on the actual provenance content, complete provenance chains can provide traceability and contribute to reproducibility and FAIRness of research objects. In this paper, we define a lightweight provenance model based on W3C PROV that enables generation of distributed provenance chains in complex, multi-organizational environments. The application of the model is demonstrated with a use case spanning several steps of a real-world research pipeline - starting with the acquisition of a specimen, its processing and storage, histological examination, and the generation/collection of associated data (images, annotations, clinical data), ending with training an AI model for the detection of tumor in the images. The proposed model has become an open conceptual foundation of the currently developed ISO 23494 standard on provenance for biotechnology domain.
起源是描述对象(如数据集或生物材料)血统的信息。由于这些对象可以在组织之间传递,因此每个组织只能记录对象生命周期的部分内容。因此,分布式起源部分的互连形成分布式起源链。根据实际起源内容,完整的起源链可以提供可追溯性,并有助于研究对象的可重复性和 FAIR 性。在本文中,我们定义了一个基于 W3C PROV 的轻量级起源模型,该模型能够在复杂的多组织环境中生成分布式起源链。该模型的应用通过一个用例演示,该用例跨越了现实世界研究管道的多个步骤——从获取标本开始,然后对其进行处理和存储,进行组织学检查,并生成/收集相关数据(图像、注释、临床数据),最后使用 AI 模型检测图像中的肿瘤。所提出的模型已成为目前正在开发的生物技术领域起源的 ISO 23494 标准的开放概念基础。