标本和数据的来源——计算病理学中人工智能发展的前提。

Provenance of specimen and data - A prerequisite for AI development in computational pathology.

机构信息

Medical University of Graz, Neue Stiftingtalstraße 6, 8010 Graz, Austria.

Masaryk University, Šumavská 416/15, 602 00 Brno, Czechia.

出版信息

N Biotechnol. 2023 Dec 25;78:22-28. doi: 10.1016/j.nbt.2023.09.006. Epub 2023 Sep 25.

DOI:10.1016/j.nbt.2023.09.006

PMID:37758054

Abstract

AI development in biotechnology relies on high-quality data to train and validate algorithms. The FAIR principles (Findable, Accessible, Interoperable, and Reusable) and regulatory frameworks such as the In Vitro Diagnostic Regulation (IVDR) and the Medical Device Regulation (MDR) specify requirements on specimen and data provenance to ensure the quality and traceability of data used in AI development. In this paper, a framework is presented for recording and publishing provenance information to meet these requirements. The framework is based on the use of standardized models and protocols, such as the W3C PROV model and the ISO 23494 series, to capture and record provenance information at various stages of the data generation and analysis process. The framework and use case illustrate the role of provenance information in supporting the development of high-quality AI algorithms in biotechnology. Finally, the principles of the framework are illustrated in a simple computational pathology use case, showing how specimen and data provenance can be used in the development and documentation of an AI algorithm. The use case demonstrates the importance of managing and integrating distributed provenance information and highlights the complex task of considering factors such as semantic interoperability, confidentiality, and the verification of authenticity and integrity.

摘要

生物技术中的人工智能开发依赖于高质量的数据来训练和验证算法。FAIR 原则（可发现、可访问、可互操作和可重用）和监管框架，如体外诊断法规 (IVDR) 和医疗器械法规 (MDR)，规定了样本和数据来源的要求，以确保人工智能开发中使用的数据的质量和可追溯性。本文提出了一个记录和发布来源信息的框架，以满足这些要求。该框架基于使用标准化模型和协议，如 W3C PROV 模型和 ISO 23494 系列，在数据生成和分析过程的各个阶段捕获和记录来源信息。该框架和用例说明了来源信息在支持生物技术中高质量人工智能算法开发方面的作用。最后，在一个简单的计算病理学用例中说明了该框架的原则，展示了如何在人工智能算法的开发和文档编制中使用样本和数据来源。该用例说明了管理和集成分布式来源信息的重要性，并强调了考虑语义互操作性、保密性以及真实性和完整性验证等因素的复杂任务。