• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

去识别化是不够的:去识别化与合成临床记录的比较。

De-identification is not enough: a comparison between de-identified and synthetic clinical notes.

机构信息

Department of Computer Science, University of Manitoba, Winnipeg, R3T 5V6, Canada.

McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, 77030, USA.

出版信息

Sci Rep. 2024 Nov 29;14(1):29669. doi: 10.1038/s41598-024-81170-y.

DOI:10.1038/s41598-024-81170-y
PMID:39613846
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11607336/
Abstract

For sharing privacy-sensitive data, de-identification is commonly regarded as adequate for safeguarding privacy. Synthetic data is also being considered as a privacy-preserving alternative. Recent successes with numerical and tabular data generative models and the breakthroughs in large generative language models raise the question of whether synthetically generated clinical notes could be a viable alternative to real notes for research purposes. In this work, we demonstrated that (i) de-identification of real clinical notes does not protect records against a membership inference attack, (ii) proposed a novel approach to generate synthetic clinical notes using the current state-of-the-art large language models, (iii) evaluated the performance of the synthetically generated notes in a clinical domain task, and (iv) proposed a way to mount a membership inference attack where the target model is trained with synthetic data. We observed that when synthetically generated notes closely match the performance of real data, they also exhibit similar privacy concerns to the real data. Whether other approaches to synthetically generated clinical notes could offer better trade-offs and become a better alternative to sensitive real notes warrants further investigation.

摘要

为了共享隐私敏感数据,去识别通常被认为是保护隐私的充分手段。合成数据也被认为是一种隐私保护的替代方案。最近在数值和表格数据生成模型以及大型生成语言模型方面的成功,提出了一个问题,即合成生成的临床笔记是否可以作为研究目的的真实笔记的可行替代方案。在这项工作中,我们证明了(i)真实临床笔记的去识别并不能防止成员推断攻击,(ii)提出了一种使用当前最先进的大型语言模型生成合成临床笔记的新方法,(iii)评估了合成生成的笔记在临床领域任务中的性能,以及(iv)提出了一种在目标模型使用合成数据进行训练的成员推断攻击的方法。我们观察到,当合成生成的笔记与真实数据的性能非常接近时,它们也表现出与真实数据类似的隐私问题。其他方法生成的合成临床笔记是否可以提供更好的权衡,并成为敏感真实笔记的更好替代方案,值得进一步研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d5f/11607336/d50c868f5572/41598_2024_81170_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d5f/11607336/b19f206f97fc/41598_2024_81170_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d5f/11607336/de2585acac73/41598_2024_81170_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d5f/11607336/80f091079a92/41598_2024_81170_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d5f/11607336/a2a6f8f571f3/41598_2024_81170_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d5f/11607336/d50c868f5572/41598_2024_81170_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d5f/11607336/b19f206f97fc/41598_2024_81170_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d5f/11607336/de2585acac73/41598_2024_81170_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d5f/11607336/80f091079a92/41598_2024_81170_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d5f/11607336/a2a6f8f571f3/41598_2024_81170_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d5f/11607336/d50c868f5572/41598_2024_81170_Fig4_HTML.jpg

相似文献

1
De-identification is not enough: a comparison between de-identified and synthetic clinical notes.去识别化是不够的:去识别化与合成临床记录的比较。
Sci Rep. 2024 Nov 29;14(1):29669. doi: 10.1038/s41598-024-81170-y.
2
Robust privacy amidst innovation with large language models through a critical assessment of the risks.通过对风险的批判性评估,在大语言模型创新中实现强大的隐私保护。
J Am Med Inform Assoc. 2025 May 1;32(5):885-892. doi: 10.1093/jamia/ocaf037.
3
Evaluating GPT models for clinical note de-identification.评估用于临床记录去识别化的GPT模型。
Sci Rep. 2025 Jan 31;15(1):3852. doi: 10.1038/s41598-025-86890-3.
4
Tunable Privacy Risk Evaluation of Generative Adversarial Networks.生成式对抗网络的可调隐私风险评估。
Stud Health Technol Inform. 2024 Aug 22;316:1233-1237. doi: 10.3233/SHTI240634.
5
On the Fidelity-Privacy Tradeoff of Synthetic Cancer Registry Data.合成癌症登记数据的保真度-隐私权衡。
Stud Health Technol Inform. 2024 Aug 22;316:621-625. doi: 10.3233/SHTI240490.
6
Reliable generation of privacy-preserving synthetic electronic health record time series via diffusion models.通过扩散模型可靠地生成隐私保护的合成电子健康记录时间序列。
J Am Med Inform Assoc. 2024 Nov 1;31(11):2529-2539. doi: 10.1093/jamia/ocae229.
7
The urgent need to accelerate synthetic data privacy frameworks for medical research.加速医学研究合成数据隐私框架的迫切需求。
Lancet Digit Health. 2025 Feb;7(2):e157-e160. doi: 10.1016/S2589-7500(24)00196-1. Epub 2024 Nov 26.
8
Bootstrapping a de-identification system for narrative patient records: cost-performance tradeoffs.为叙事性患者记录构建去识别系统:成本效益权衡。
Int J Med Inform. 2013 Sep;82(9):821-31. doi: 10.1016/j.ijmedinf.2013.03.005. Epub 2013 Apr 30.
9
Preserving privacy in healthcare: A systematic review of deep learning approaches for synthetic data generation.医疗保健中的隐私保护:对用于合成数据生成的深度学习方法的系统综述。
Comput Methods Programs Biomed. 2025 Mar;260:108571. doi: 10.1016/j.cmpb.2024.108571. Epub 2024 Dec 28.
10
Privacy preserving index for encrypted electronic medical records.加密电子病历的隐私保护索引
J Med Syst. 2013 Dec;37(6):9992. doi: 10.1007/s10916-013-9992-x. Epub 2013 Oct 26.

引用本文的文献

1
Efficient Detection of Stigmatizing Language in Electronic Health Records via In-Context Learning: Comparative Analysis and Validation Study.通过上下文学习在电子健康记录中高效检测污名化语言:比较分析与验证研究
JMIR Med Inform. 2025 Aug 18;13:e68955. doi: 10.2196/68955.
2
Not Fully Synthetic: LLM-based Hybrid Approaches Towards Privacy-Preserving Clinical Note Sharing.非完全合成:基于大语言模型的隐私保护临床笔记共享混合方法。
AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:441-450. eCollection 2025.

本文引用的文献

1
Natural Language Processing for Enterprise-scale De-identification of Protected Health Information in Clinical Notes.自然语言处理在临床记录中用于企业级的保护健康信息去识别。
AMIA Jt Summits Transl Sci Proc. 2022 May 23;2022:92-101. eCollection 2022.
2
Are synthetic clinical notes useful for real natural language processing tasks: A case study on clinical entity recognition.用于真实自然语言处理任务的合成临床笔记是否有用:以临床实体识别为例的研究
J Am Med Inform Assoc. 2021 Sep 18;28(10):2193-2201. doi: 10.1093/jamia/ocab112.
3
Predicting mortality in critically ill patients with diabetes using machine learning and clinical notes.
使用机器学习和临床记录预测危重症糖尿病患者的死亡率。
BMC Med Inform Decis Mak. 2020 Dec 30;20(Suppl 11):295. doi: 10.1186/s12911-020-01318-4.
4
Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation.评估完全合成健康数据中的身份披露风险:模型开发与验证
J Med Internet Res. 2020 Nov 16;22(11):e23139. doi: 10.2196/23139.
5
De-identification of electronic health record using neural network.使用神经网络对电子健康记录进行去识别化。
Sci Rep. 2020 Oct 29;10(1):18600. doi: 10.1038/s41598-020-75544-1.
6
A study of deep learning methods for de-identification of clinical notes in cross-institute settings.深度学习方法在跨机构环境下对临床记录进行去识别的研究。
BMC Med Inform Decis Mak. 2019 Dec 5;19(Suppl 5):232. doi: 10.1186/s12911-019-0935-4.
7
Enhancing Prediction Models for One-Year Mortality in Patients with Acute Myocardial Infarction and Post Myocardial Infarction Syndrome.增强急性心肌梗死和心肌梗死后综合征患者一年死亡率的预测模型
Stud Health Technol Inform. 2019 Aug 21;264:273-277. doi: 10.3233/SHTI190226.
8
What's in a Note? Unpacking Predictive Value in Clinical Note Representations.一份记录中包含什么?剖析临床记录表示中的预测价值。
AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:26-34. eCollection 2018.
9
A unified framework for evaluating the risk of re-identification of text de-identification tools.用于评估文本去识别工具重新识别风险的统一框架。
J Biomed Inform. 2016 Oct;63:174-183. doi: 10.1016/j.jbi.2016.07.015. Epub 2016 Jul 15.
10
Real-time prediction of mortality, readmission, and length of stay using electronic health record data.利用电子健康记录数据对死亡率、再入院率和住院时间进行实时预测。
J Am Med Inform Assoc. 2016 May;23(3):553-61. doi: 10.1093/jamia/ocv110. Epub 2015 Sep 15.