Bisson Tom, Franz Michael, Dogan O Isil, Romberg Daniel, Jansen Christoph, Hufnagl Peter, Zerbe Norman
Institute of Pathology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany.
Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany.
Digit Health. 2023 May 9;9:20552076231171475. doi: 10.1177/20552076231171475. eCollection 2023 Jan-Dec.
The exchange of health-related data is subject to regional laws and regulations, such as the General Data Protection Regulation (GDPR) in the EU or the Health Insurance Portability and Accountability Act (HIPAA) in the United States, resulting in non-trivial challenges for researchers and educators when working with these data. In pathology, the digitization of diagnostic tissue samples inevitably generates identifying data that can consist of sensitive but also acquisition-related information stored in vendor-specific file formats. Distribution and off-clinical use of these Whole Slide Images (WSIs) are usually done in these formats, as an industry-wide standardization such as DICOM is yet only tentatively adopted and slide scanner vendors currently do not provide anonymization functionality.
We developed a guideline for the proper handling of histopathological image data particularly for research and education with regard to the GDPR. In this context, we evaluated existing anonymization methods and examined proprietary format specifications to identify all sensitive information for the most common WSI formats. This work results in a software library that enables GDPR-compliant anonymization of WSIs while preserving the native formats.
Based on the analysis of proprietary formats, all occurrences of sensitive information were identified for file formats frequently used in clinical routine, and finally, an open-source programming library with an executable CLI tool and wrappers for different programming languages was developed.
Our analysis showed that there is no straightforward software solution to anonymize WSIs in a GDPR-compliant way while maintaining the data format. We closed this gap with our extensible open-source library that works instantaneously and offline.
健康相关数据的交换受区域法律法规的约束,如欧盟的《通用数据保护条例》(GDPR)或美国的《健康保险流通与责任法案》(HIPAA),这给研究人员和教育工作者在处理这些数据时带来了不小的挑战。在病理学中,诊断组织样本的数字化不可避免地会生成识别数据,这些数据可能包含敏感信息以及以供应商特定文件格式存储的与采集相关的信息。这些全切片图像(WSIs)的分发和非临床使用通常以这些格式进行,因为诸如DICOM之类的行业范围标准化仅被初步采用,并且幻灯片扫描仪供应商目前不提供匿名化功能。
我们制定了一项指南,用于妥善处理组织病理学图像数据,特别是针对符合GDPR的研究和教育。在此背景下,我们评估了现有的匿名化方法,并检查了专有格式规范,以识别最常见WSI格式的所有敏感信息。这项工作产生了一个软件库,该软件库能够在保持原生格式的同时对WSIs进行符合GDPR的匿名化处理。
基于对专有格式的分析,识别出了临床常规中常用文件格式的所有敏感信息出现位置,最后,开发了一个带有可执行CLI工具和针对不同编程语言的包装器的开源编程库。
我们的分析表明,不存在一种简单的软件解决方案能够以符合GDPR的方式对WSIs进行匿名化处理同时保持数据格式。我们通过可即时离线运行的可扩展开源库填补了这一空白。