评估用于临床记录去识别化的GPT模型。

Evaluating GPT models for clinical note de-identification.

作者信息

Altalla' Bayan, Abdalla Sameera, Altamimi Ahmad, Bitar Layla, Al Omari Amal, Kardan Ramiz, Sultan Iyad

机构信息

King Hussein Cancer Center, Queen Rania Street, Amman, Jordan.

Princess Sumaya University for Technology, Khalil Al-Saket St, Amman, Jordan.

出版信息

Sci Rep. 2025 Jan 31;15(1):3852. doi: 10.1038/s41598-025-86890-3.

DOI:10.1038/s41598-025-86890-3

PMID:39890969

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11785955/

Abstract

The rapid digitalization of healthcare has created a pressing need for solutions that manage clinical data securely while ensuring patient privacy. This study evaluates the capabilities of GPT-3.5 and GPT-4 models in de-identifying clinical notes and generating synthetic data, using API access and zero-shot prompt engineering to optimize computational efficiency. Results show that GPT-4 significantly outperformed GPT-3.5, achieving a precision of 0.9925, a recall of 0.8318, an F1 score of 0.8973, and an accuracy of 0.9911. These results demonstrate GPT-4's potential as a powerful tool for safeguarding patient privacy while increasing the availability of clinical data for research. This work sets a benchmark for balancing data utility and privacy in healthcare data management.

摘要

医疗保健的快速数字化产生了对安全管理临床数据同时确保患者隐私的解决方案的迫切需求。本研究评估了GPT-3.5和GPT-4模型在去识别临床记录和生成合成数据方面的能力，使用API访问和零样本提示工程来优化计算效率。结果表明，GPT-4的表现明显优于GPT-3.5，精确率为0.9925，召回率为0.8318，F1分数为0.8973，准确率为0.9911。这些结果证明了GPT-4作为一种强大工具的潜力，既能保护患者隐私，又能增加临床数据用于研究的可用性。这项工作为医疗保健数据管理中平衡数据效用和隐私设定了一个基准。