Suppr超能文献

放射学报告的自动匿名化:公开可用的自然语言处理与大语言模型的比较

Automated anonymization of radiology reports: comparison of publicly available natural language processing and large language models.

作者信息

Langenbach Marcel C, Foldyna Borek, Hadzic Ibrahim, Langenbach Isabel L, Raghu Vineet K, Lu Michael T, Neilan Tomas G, Heemelaar Julius C

机构信息

Cardiovascular Imaging Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.

Institute for Diagnostic and Interventional Radiology, University Hospital Cologne, Cologne, Germany.

出版信息

Eur Radiol. 2025 May;35(5):2634-2641. doi: 10.1007/s00330-024-11148-x. Epub 2024 Oct 31.

Abstract

PURPOSE

Medical reports, governed by HIPAA regulations, contain personal health information (PHI), restricting secondary data use. Utilizing natural language processing (NLP) and large language models (LLM), we sought to employ publicly available methods to automatically anonymize PHI in free-text radiology reports.

MATERIALS AND METHODS

We compared two publicly available rule-based NLP models (spaCy; NLP, accuracy-optimized; NLP, speed-optimized; iteratively improved on 400 free-text CT-reports (test set)) and one offline LLM approach (LLM-model, LLaMa-2, Meta-AI) for PHI-anonymization. The three models were tested on 100 randomly selected chest CT reports. Two investigators assessed the anonymization of occurring PHI entities and whether clinical information was removed. Subsequently, precision, recall, and F1 scores were calculated.

RESULTS

NLP and NLP successfully removed all instances of dates (n = 333), medical record numbers (MRN) (n = 6), and accession numbers (ACC) (n = 92). The LLM model removed all MRNs, 96% of ACCs, and 32% of dates. NLP was most consistent with a perfect F1-score of 1.00, followed by NLP with lower precision (0.86) and F1-score (0.92) for dates. The LLM model had perfect precision for MRNs, ACCs, and dates but the lowest recall for ACC (0.96) and dates (0.52), corresponding F1 scores of 0.98 and 0.68, respectively. Names were removed completely or majorly (i.e., one first or family name non-anonymized) in 100% (NLP), 72% (NLP), and 90% (LLM-model). Importantly, NLP and NLP did not remove medical information, while the LLM model did in 10% (n = 10).

CONCLUSION

Pre-trained NLP models can effectively anonymize free-text radiology reports, while anonymization with the LLM model is more prone to deleting medical information.

KEY POINTS

Question This study compares NLP and locally hosted LLM techniques to ensure PHI anonymization without losing clinical information. Findings Pre-trained NLP models effectively anonymized radiology reports without removing clinical data, while a locally hosted LLM was less reliable, risking the loss of important information. Clinical relevance Fast, reliable, automated anonymization of PHI from radiology reports enables HIPAA-compliant secondary use, facilitating advanced applications like LLM-driven radiology analysis while ensuring ethical handling of sensitive patient data.

摘要

目的

受《健康保险流通与责任法案》(HIPAA)法规约束的医疗报告包含个人健康信息(PHI),限制了二次数据的使用。我们利用自然语言处理(NLP)和大语言模型(LLM),寻求采用公开可用的方法对自由文本放射学报告中的PHI进行自动匿名化处理。

材料与方法

我们比较了两种公开可用的基于规则的NLP模型(spaCy;NLP,准确性优化;NLP,速度优化;在400份自由文本CT报告(测试集)上进行迭代改进)和一种离线LLM方法(LLM模型,LLaMa-2,Meta-AI)用于PHI匿名化。这三种模型在100份随机选择的胸部CT报告上进行测试。两名研究人员评估出现的PHI实体的匿名化情况以及临床信息是否被删除。随后,计算精确率、召回率和F1分数。

结果

NLP和NLP成功删除了所有日期实例(n = 333)、病历号(MRN)(n = 6)和 accession号(ACC)(n = 92)。LLM模型删除了所有MRN、96%的ACC和32%的日期。NLP最为一致,日期的F1分数完美为1.00,其次是NLP,其日期的精确率较低(0.86)和F1分数(0.92)。LLM模型在MRN、ACC和日期方面具有完美的精确率,但ACC的召回率最低(0.96),日期的召回率最低(0.52),相应的F1分数分别为0.98和0.68。姓名在100%(NLP)、72%(NLP)和90%(LLM模型)的情况下被完全或大部分删除(即一个名字或姓氏未匿名化)。重要的是,NLP和NLP没有删除医疗信息,而LLM模型删除了10%(n = 10)的医疗信息。

结论

预训练的NLP模型可以有效地对自由文本放射学报告进行匿名化处理,而使用LLM模型进行匿名化更容易删除医疗信息。

关键点

问题 本研究比较了NLP和本地托管的LLM技术,以确保PHI匿名化而不丢失临床信息。发现 预训练的NLP模型有效地对放射学报告进行了匿名化处理,而没有删除临床数据,而本地托管的LLM不太可靠,有丢失重要信息的风险。临床相关性 对放射学报告中的PHI进行快速、可靠的自动匿名化处理,能够实现符合HIPAA的二次使用,促进如LLM驱动的放射学分析等高阶应用,同时确保对敏感患者数据进行符合伦理的处理。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验