Suppr超能文献

非完全合成:基于大语言模型的隐私保护临床笔记共享混合方法。

Not Fully Synthetic: LLM-based Hybrid Approaches Towards Privacy-Preserving Clinical Note Sharing.

作者信息

Rahman Sarkar Atiquer, Chuang Yao-Shun, Jiang Xiaoqian, Mohammed Noman

机构信息

University of Manitoba, Winnipeg, Manitoba, Canada.

出版信息

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:441-450. eCollection 2025.

Abstract

The publication and sharing of clinical notes are crucial for healthcare research and innovation. However, privacy regulations such as HIPAA and GDPR pose significant challenges. While de-identification techniques aim to remove protected health information, they often fall short of achieving complete privacy protection. Similarly, the current state of synthetic clinical note generation can lack nuance and content coverage. To address these limitations, we propose an approach that combines de-identification, filtration, and synthetic clinical note generation. Variations of this approach currently retain 36%-61% of the original note's content and fill the remaining gaps using an LLM, ensuring high information coverage. We also evaluated the de-identification performance of the hybrid notes, demonstrating that they surpass or at least match the standalone de-identification methods. Our results show that hybrid notes can maintain patient privacy while preserving the richness of clinical data. This approach offers a promising solution for safe and effective data sharing, encouraging further research.

摘要

临床记录的发布和共享对医疗保健研究与创新至关重要。然而,诸如《健康保险流通与责任法案》(HIPAA)和《通用数据保护条例》(GDPR)等隐私法规带来了重大挑战。虽然去识别技术旨在去除受保护的健康信息,但它们往往无法实现完全的隐私保护。同样,合成临床记录生成的现状可能缺乏细微差别和内容覆盖范围。为了解决这些限制,我们提出了一种结合去识别、过滤和合成临床记录生成的方法。这种方法的变体目前保留了原始记录36% - 61%的内容,并使用语言模型(LLM)填补其余空白,确保高信息覆盖率。我们还评估了混合记录的去识别性能,证明它们超越或至少与独立的去识别方法相当。我们的结果表明,混合记录可以在保护患者隐私的同时保留临床数据的丰富性。这种方法为安全有效的数据共享提供了一个有前景的解决方案,鼓励进一步研究。

相似文献

1
Not Fully Synthetic: LLM-based Hybrid Approaches Towards Privacy-Preserving Clinical Note Sharing.
AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:441-450. eCollection 2025.
3
The Black Book of Psychotropic Dosing and Monitoring.
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
4
A Spectrum of Understanding: A Qualitative Exploration of Autistic Adults' Understandings and Perceptions of Friendship(s).
Autism Adulthood. 2024 Dec 2;6(4):438-450. doi: 10.1089/aut.2023.0051. eCollection 2024 Dec.
7
Consequences, costs and cost-effectiveness of workforce configurations in English acute hospitals.
Health Soc Care Deliv Res. 2025 Jul;13(25):1-107. doi: 10.3310/ZBAR9152.
8
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
10
The health economics of insulin therapy: How do we address the rising demands, costs, inequalities and barriers to achieving optimal outcomes.
Diabetes Obes Metab. 2025 Jul;27 Suppl 5(Suppl 5):24-35. doi: 10.1111/dom.16488. Epub 2025 Jun 4.

本文引用的文献

1
Robust privacy amidst innovation with large language models through a critical assessment of the risks.
J Am Med Inform Assoc. 2025 May 1;32(5):885-892. doi: 10.1093/jamia/ocaf037.
2
De-identification is not enough: a comparison between de-identified and synthetic clinical notes.
Sci Rep. 2024 Nov 29;14(1):29669. doi: 10.1038/s41598-024-81170-y.
3
De-identification of clinical free text using natural language processing: A systematic review of current approaches.
Artif Intell Med. 2024 May;151:102845. doi: 10.1016/j.artmed.2024.102845. Epub 2024 Mar 20.
4
De-identification of free text data containing personal health information: a scoping review of reviews.
Int J Popul Data Sci. 2023 Dec 12;8(1):2153. doi: 10.23889/ijpds.v8i1.2153. eCollection 2023.
6
The RareDis corpus: A corpus annotated with rare diseases, their signs and symptoms.
J Biomed Inform. 2022 Jan;125:103961. doi: 10.1016/j.jbi.2021.103961. Epub 2021 Dec 5.
7
An Accurate Deep Learning Model for Clinical Entity Recognition From Clinical Notes.
IEEE J Biomed Health Inform. 2021 Oct;25(10):3804-3811. doi: 10.1109/JBHI.2021.3099755. Epub 2021 Oct 5.
9
HunFlair: an easy-to-use tool for state-of-the-art biomedical named entity recognition.
Bioinformatics. 2021 Sep 9;37(17):2792-2794. doi: 10.1093/bioinformatics/btab042.
10
Predicting mortality in critically ill patients with diabetes using machine learning and clinical notes.
BMC Med Inform Decis Mak. 2020 Dec 30;20(Suppl 11):295. doi: 10.1186/s12911-020-01318-4.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验