一种高效去除 DICOM 元数据和已烧入像素文本的方法。

A Method for Efficient De-identification of DICOM Metadata and Burned-in Pixel Text.

机构信息

Department of Radiology, Duke University, Durham, NC, USA.

School of Medicine, Duke University, Durham, NC, USA.

出版信息

J Imaging Inform Med. 2024 Oct;37(5):1-7. doi: 10.1007/s10278-024-01098-7. Epub 2024 Apr 8.

DOI:10.1007/s10278-024-01098-7

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11522224/

Abstract

De-identification of DICOM images is an essential component of medical image research. While many established methods exist for the safe removal of protected health information (PHI) in DICOM metadata, approaches for the removal of PHI "burned-in" to image pixel data are typically manual, and automated high-throughput approaches are not well validated. Emerging optical character recognition (OCR) models can potentially detect and remove PHI-bearing text from medical images but are very time-consuming to run on the high volume of images found in typical research studies. We present a data processing method that performs metadata de-identification for all images combined with a targeted approach to only apply OCR to images with a high likelihood of burned-in text. The method was validated on a dataset of 415,182 images across ten modalities representative of the de-identification requests submitted at our institution over a 20-year span. Of the 12,578 images in this dataset with burned-in text of any kind, only 10 passed undetected with the method. OCR was only required for 6050 images (1.5% of the dataset).

摘要

DICOM 图像去识别是医学图像研究的一个重要组成部分。虽然已经有许多成熟的方法可以安全地去除 DICOM 元数据中的保护健康信息（PHI），但去除图像像素数据中“嵌入”的 PHI 的方法通常是手动的，并且自动化的高通量方法尚未得到很好的验证。新兴的光学字符识别（OCR）模型可以从医学图像中检测和去除包含 PHI 的文本，但在典型的研究中处理大量图像时非常耗时。我们提出了一种数据处理方法，该方法可以对所有图像进行元数据去识别，并结合一种有针对性的方法，仅对有高概率嵌入文本的图像应用 OCR。该方法在一个包含 415182 张图像的数据集上进行了验证，这些图像代表了我们机构在 20 年期间提交的去识别请求的 10 种模态。在这个包含任何类型的嵌入文本的 12578 张图像中，只有 10 张未被该方法检测到。仅需要对 6050 张图像（占数据集的 1.5%）进行 OCR。

相似文献

1

A Method for Efficient De-identification of DICOM Metadata and Burned-in Pixel Text.一种高效去除 DICOM 元数据和已烧入像素文本的方法。

J Imaging Inform Med. 2024 Oct;37(5):1-7. doi: 10.1007/s10278-024-01098-7. Epub 2024 Apr 8.

2

Identification and classification of DICOM files with burned-in text content.带有嵌入式文本内容的 DICOM 文件的识别与分类。

Int J Med Inform. 2019 Jun;126:128-137. doi: 10.1016/j.ijmedinf.2019.02.011. Epub 2019 Mar 1.

3

Automated selection of abdominal MRI series using a DICOM metadata classifier and selective use of a pixel-based classifier.基于 DICOM 元数据分类器的自动选择腹部 MRI 序列和基于像素分类器的选择性使用。

Abdom Radiol (NY). 2024 Oct;49(10):3735-3746. doi: 10.1007/s00261-024-04379-5. Epub 2024 Jun 11.

4

De-identification of Medical Images with Retention of Scientific Research Value.在保留科研价值的同时对医学图像进行去识别化处理。

Radiographics. 2015 May-Jun;35(3):727-35. doi: 10.1148/rg.2015140244.

5

Anonymization of DICOM electronic medical records for radiation therapy.用于放射治疗的DICOM电子病历匿名化处理。

Comput Biol Med. 2014 Oct;53:134-40. doi: 10.1016/j.compbiomed.2014.07.010. Epub 2014 Jul 26.

6

Radtools: R utilities for convenient extraction of medical image metadata.Radtools：用于便捷提取医学图像元数据的R实用工具。

F1000Res. 2018 Dec 24;7. doi: 10.12688/f1000research.17139.3. eCollection 2018.

7

DicomBrowser: software for viewing and modifying DICOM metadata.DicomBrowser：用于查看和修改 DICOM 元数据的软件。

J Digit Imaging. 2012 Oct;25(5):635-45. doi: 10.1007/s10278-012-9462-x.

8

Transforming Dermatologic Imaging for the Digital Era: Metadata and Standards.皮肤病学影像在数字时代的变革：元数据和标准。

J Digit Imaging. 2018 Aug;31(4):568-577. doi: 10.1007/s10278-017-0045-8.

9

Separation of metadata and pixel data to speed DICOM tag morphing.分离元数据和像素数据以加速DICOM标签变形。

Stud Health Technol Inform. 2013;192:1143.

10

Providing integrity, authenticity, and confidentiality for header and pixel data of DICOM images.为DICOM图像的头部和像素数据提供完整性、真实性和保密性。

J Digit Imaging. 2015 Apr;28(2):179-87. doi: 10.1007/s10278-014-9734-8.

引用本文的文献

1

Exploring AI-Based System Design for Pixel-Level Protected Health Information Detection in Medical Images.探索用于医学图像中像素级受保护健康信息检测的基于人工智能的系统设计。

J Imaging Inform Med. 2025 Jul 25. doi: 10.1007/s10278-025-01619-y.

本文引用的文献

1

A De-Identification Pipeline for Ultrasound Medical Images in DICOM Format.一种用于DICOM格式超声医学图像的去识别化流程

J Med Syst. 2017 May;41(5):89. doi: 10.1007/s10916-017-0736-1. Epub 2017 Apr 13.

2

Free DICOM de-identification tools in clinical research: functioning and safety of patient privacy.临床研究中的免费DICOM去识别工具：患者隐私的功能与安全性

Eur Radiol. 2015 Dec;25(12):3685-95. doi: 10.1007/s00330-015-3794-0. Epub 2015 Jun 3.

3

Beyond the DICOM header: additional issues in deidentification.超越 DICOM 头部：去识别化中的其他问题。

AJR Am J Roentgenol. 2014 Dec;203(6):W658-64. doi: 10.2214/AJR.13.11789.

4

Introduction to the DICOM standard.医学数字成像和通信（DICOM）标准简介。

Eur Radiol. 2002 Apr;12(4):920-7. doi: 10.1007/s003300101100. Epub 2001 Sep 15.

5

Understanding and using DICOM, the data interchange standard for biomedical imaging.理解并使用DICOM，这一生物医学成像的数据交换标准。

J Am Med Inform Assoc. 1997 May-Jun;4(3):199-212. doi: 10.1136/jamia.1997.0040199.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验