Lee Taehee, Kim Hyungjin, Park Seong Ho, Chae Seonhye, Yoon Soon Ho
Department of Radiology, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Korea.
Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea.
Radiology. 2025 Jun;315(3):e243664. doi: 10.1148/radiol.243664.
Background Advances in vision-language models (VLMs) may enable detection and deidentification of burned-in protected health information (PHI) on medical images. Purpose To investigate the ability of commercial and open-source VLMs to detect burned-in PHI on medical images, confirm full deidentification, and obscure PHI where present. Materials and Methods In this retrospective study, records of deceased patients aged 18 years or older who died during admission at a tertiary hospital between January and June 2021 were randomly selected. One study per modality was randomly selected. Images were preprocessed to ensure burned-in PHI and test four scenarios of deidentification conditions: all PHI text is visible, PHI text is redacted using asterisks, PHI text is removed, and all text is removed. Real PHI was replaced with fictitious data to protect privacy. Four VLMs (three commercial: ChatGPT-4o [OpenAI], Gemini 1.5 Pro [Google]), and Claude-3 Haiku [Anthropic]; one open-source: Llama 3.2 Vision 11B [Meta]) were tested on three tasks: task 1, overall confirmation of deidentification; task 2, detection and specification of any identifiable PHI items; and task 3, detection and specification of the five preselected PHI items (name, identification number, date of birth, age, and sex). Text was extracted from images using an open-source Tesseract optical character recognition software and input into the VLMs for the same tasks. Additionally, the capability of each VLM to mask detected PHI fields was evaluated. Statistical comparisons were conducted using χ, independent tests, or generalized estimating equations. Results Data from 100 deceased patients (mean age, 71.1 years ± 10.1 [SD]; 57 male) with 709 imaging studies were randomly included. Among 6696 PHI occurrences, ChatGPT-4o achieved deidentification verification accuracy of 95.0% ( = 6362) for task 1, 61.2% ( = 4098) for task 2, and 96.2% ( = 6441) for task 3, outperforming Gemini 1.5 Pro (68.1%, 55.2%, and 86.3% for tasks 1-3, respectively), Claude-3 Haiku (75.8%, 86.9%, and 79.4% for tasks 1-3, respectively), and Llama 3.2 Vision 11B (51.6%, 66.9%, and 74.3% for tasks 1-3, respectively) ( < .001 for all). Direct image analysis by ChatGPT-4o and Gemini 1.5 Pro was more accurate than the optical character recognition software for PHI detection across all three deidentification verification tasks ( < .001 for all). Among 375 PHI occurrences on 100 images, ChatGPT-4o successfully obscured 81.1% ( = 304) of them. Conclusion ChatGPT-4o demonstrated substantial potential in detecting, verifying, and obscuring burned-in PHI on medical images. © RSNA, 2025 See also the editorial by Pinto dos Santos in this issue.
背景 视觉语言模型(VLM)的进展可能有助于检测和去识别医学图像上的嵌入式受保护健康信息(PHI)。目的 研究商业和开源VLM检测医学图像上嵌入式PHI的能力,确认完全去识别,并对存在的PHI进行模糊处理。材料与方法 在这项回顾性研究中,随机选择了2021年1月至6月在一家三级医院住院期间死亡的18岁及以上死者的记录。每个模态随机选择一项研究。对图像进行预处理以确保存在嵌入式PHI,并测试四种去识别条件场景:所有PHI文本可见、PHI文本用星号编辑、PHI文本被删除以及所有文本被删除。真实的PHI被替换为虚拟数据以保护隐私。对四个VLM(三个商业的:ChatGPT-4o[OpenAI]、Gemini 1.5 Pro[谷歌]、Claude-3 Haiku[Anthropic];一个开源的:Llama 3.2 Vision 11B[Meta])进行三项任务测试:任务1,去识别的总体确认;任务2,检测并指定任何可识别的PHI项目;任务3,检测并指定五个预选的PHI项目(姓名、识别号、出生日期、年龄和性别)。使用开源的Tesseract光学字符识别软件从图像中提取文本,并输入到VLM中执行相同任务。此外,评估了每个VLM掩盖检测到的PHI字段的能力。使用χ²检验、独立样本t检验或广义估计方程进行统计比较。结果 随机纳入了100例死者(平均年龄71.1岁±10.1[标准差];57例男性)的709项影像学研究的数据。在6696次PHI出现中,ChatGPT-4o在任务1中的去识别验证准确率为9