• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

非完全合成:基于大语言模型的隐私保护临床笔记共享混合方法。

Not Fully Synthetic: LLM-based Hybrid Approaches Towards Privacy-Preserving Clinical Note Sharing.

作者信息

Rahman Sarkar Atiquer, Chuang Yao-Shun, Jiang Xiaoqian, Mohammed Noman

机构信息

University of Manitoba, Winnipeg, Manitoba, Canada.

出版信息

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:441-450. eCollection 2025.

PMID:40502247
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12150723/
Abstract

The publication and sharing of clinical notes are crucial for healthcare research and innovation. However, privacy regulations such as HIPAA and GDPR pose significant challenges. While de-identification techniques aim to remove protected health information, they often fall short of achieving complete privacy protection. Similarly, the current state of synthetic clinical note generation can lack nuance and content coverage. To address these limitations, we propose an approach that combines de-identification, filtration, and synthetic clinical note generation. Variations of this approach currently retain 36%-61% of the original note's content and fill the remaining gaps using an LLM, ensuring high information coverage. We also evaluated the de-identification performance of the hybrid notes, demonstrating that they surpass or at least match the standalone de-identification methods. Our results show that hybrid notes can maintain patient privacy while preserving the richness of clinical data. This approach offers a promising solution for safe and effective data sharing, encouraging further research.

摘要

临床记录的发布和共享对医疗保健研究与创新至关重要。然而,诸如《健康保险流通与责任法案》(HIPAA)和《通用数据保护条例》(GDPR)等隐私法规带来了重大挑战。虽然去识别技术旨在去除受保护的健康信息,但它们往往无法实现完全的隐私保护。同样,合成临床记录生成的现状可能缺乏细微差别和内容覆盖范围。为了解决这些限制,我们提出了一种结合去识别、过滤和合成临床记录生成的方法。这种方法的变体目前保留了原始记录36% - 61%的内容,并使用语言模型(LLM)填补其余空白,确保高信息覆盖率。我们还评估了混合记录的去识别性能,证明它们超越或至少与独立的去识别方法相当。我们的结果表明,混合记录可以在保护患者隐私的同时保留临床数据的丰富性。这种方法为安全有效的数据共享提供了一个有前景的解决方案,鼓励进一步研究。

相似文献

1
Not Fully Synthetic: LLM-based Hybrid Approaches Towards Privacy-Preserving Clinical Note Sharing.非完全合成:基于大语言模型的隐私保护临床笔记共享混合方法。
AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:441-450. eCollection 2025.
2
Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验:定性证据综合。
Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.
3
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
4
A Spectrum of Understanding: A Qualitative Exploration of Autistic Adults' Understandings and Perceptions of Friendship(s).理解的光谱:对自闭症成年人对友谊的理解与认知的质性探索
Autism Adulthood. 2024 Dec 2;6(4):438-450. doi: 10.1089/aut.2023.0051. eCollection 2024 Dec.
5
Privacy-Preserving Glycemic Management in Type 1 Diabetes: Development and Validation of a Multiobjective Federated Reinforcement Learning Framework.1型糖尿病中保护隐私的血糖管理:多目标联邦强化学习框架的开发与验证
JMIR Diabetes. 2025 Jul 4;10:e72874. doi: 10.2196/72874.
6
Stakeholders' perceptions and experiences of factors influencing the commissioning, delivery, and uptake of general health checks: a qualitative evidence synthesis.利益相关者对影响一般健康检查的委托、提供和接受因素的看法与体验:一项定性证据综合分析
Cochrane Database Syst Rev. 2025 Mar 20;3(3):CD014796. doi: 10.1002/14651858.CD014796.pub2.
7
Consequences, costs and cost-effectiveness of workforce configurations in English acute hospitals.英国急症医院劳动力配置的后果、成本及成本效益
Health Soc Care Deliv Res. 2025 Jul;13(25):1-107. doi: 10.3310/ZBAR9152.
8
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
9
The quantity, quality and findings of network meta-analyses evaluating the effectiveness of GLP-1 RAs for weight loss: a scoping review.评估胰高血糖素样肽-1受体激动剂(GLP-1 RAs)减肥效果的网状Meta分析的数量、质量及结果:一项范围综述
Health Technol Assess. 2025 Jun 25:1-73. doi: 10.3310/SKHT8119.
10
The health economics of insulin therapy: How do we address the rising demands, costs, inequalities and barriers to achieving optimal outcomes.胰岛素治疗的卫生经济学:我们如何应对不断增长的需求、成本、不平等现象以及实现最佳治疗效果的障碍。
Diabetes Obes Metab. 2025 Jul;27 Suppl 5(Suppl 5):24-35. doi: 10.1111/dom.16488. Epub 2025 Jun 4.

本文引用的文献

1
Robust privacy amidst innovation with large language models through a critical assessment of the risks.通过对风险的批判性评估,在大语言模型创新中实现强大的隐私保护。
J Am Med Inform Assoc. 2025 May 1;32(5):885-892. doi: 10.1093/jamia/ocaf037.
2
De-identification is not enough: a comparison between de-identified and synthetic clinical notes.去识别化是不够的:去识别化与合成临床记录的比较。
Sci Rep. 2024 Nov 29;14(1):29669. doi: 10.1038/s41598-024-81170-y.
3
De-identification of clinical free text using natural language processing: A systematic review of current approaches.使用自然语言处理对临床自由文本进行去识别化:当前方法的系统评价。
Artif Intell Med. 2024 May;151:102845. doi: 10.1016/j.artmed.2024.102845. Epub 2024 Mar 20.
4
De-identification of free text data containing personal health information: a scoping review of reviews.去标识化包含个人健康信息的自由文本数据:综述的综述。
Int J Popul Data Sci. 2023 Dec 12;8(1):2153. doi: 10.23889/ijpds.v8i1.2153. eCollection 2023.
5
Discovering biomarkers associated and predicting cardiovascular disease with high accuracy using a novel nexus of machine learning techniques for precision medicine.利用机器学习技术的新型融合,准确发现与心血管疾病相关的生物标志物并进行预测,为精准医疗提供支持。
Sci Rep. 2024 Jan 2;14(1):1. doi: 10.1038/s41598-023-50600-8.
6
The RareDis corpus: A corpus annotated with rare diseases, their signs and symptoms.罕见病语料库:一个标注了罕见病、其症状和体征的语料库。
J Biomed Inform. 2022 Jan;125:103961. doi: 10.1016/j.jbi.2021.103961. Epub 2021 Dec 5.
7
An Accurate Deep Learning Model for Clinical Entity Recognition From Clinical Notes.一种从临床笔记中识别临床实体的精确深度学习模型。
IEEE J Biomed Health Inform. 2021 Oct;25(10):3804-3811. doi: 10.1109/JBHI.2021.3099755. Epub 2021 Oct 5.
8
Comparative Study of Various Approaches for Ensemble-based De-identification of Electronic Health Record Narratives.基于集成的电子健康记录叙述去识别方法的比较研究。
AMIA Annu Symp Proc. 2021 Jan 25;2020:648-657. eCollection 2020.
9
HunFlair: an easy-to-use tool for state-of-the-art biomedical named entity recognition.HunFlair:一种用于最先进生物医学命名实体识别的易于使用的工具。
Bioinformatics. 2021 Sep 9;37(17):2792-2794. doi: 10.1093/bioinformatics/btab042.
10
Predicting mortality in critically ill patients with diabetes using machine learning and clinical notes.使用机器学习和临床记录预测危重症糖尿病患者的死亡率。
BMC Med Inform Decis Mak. 2020 Dec 30;20(Suppl 11):295. doi: 10.1186/s12911-020-01318-4.