• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于自然语言处理制品的隐私保护分布式过滤框架。

A privacy-preserving distributed filtering framework for NLP artifacts.

机构信息

Department of Computer Science, University of Manitoba, Winnipeg, MB, R3T 2N2, Canada.

Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA.

出版信息

BMC Med Inform Decis Mak. 2019 Sep 7;19(1):183. doi: 10.1186/s12911-019-0867-z.

DOI:10.1186/s12911-019-0867-z
PMID:31493797
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6731605/
Abstract

BACKGROUND

Medical data sharing is a big challenge in biomedicine, which often hinders collaborative research. Due to privacy concerns, clinical notes cannot be directly shared. A lot of efforts have been dedicated to de-identifying clinical notes but it is still very challenging to accurately locate and scrub all sensitive elements from notes in an automatic manner. An alternative approach is to remove sentences that might contain sensitive terms related to personal information.

METHODS

A previous study introduced a frequency-based filtering approach that removes sentences containing low frequency bigrams to improve the privacy protection without significantly decreasing the utility. Our work extends this method to consider clinical notes from distributed sources with security and privacy considerations. We developed a novel secure protocol based on private set intersection and secure thresholding to identify uncommon and low-frequency terms, which can be used to guide sentence filtering.

RESULTS

As the computational cost of our proposed framework mostly depends on the cardinality of the intersection of the sets and the number of data owners, we evaluated the framework in terms of these two factors. Experimental results demonstrate that our proposed method is scalable in various experimental settings. In addition, we evaluated our framework in terms of data utility. This evaluation shows that the proposed method is able to retain enough information for data analysis.

CONCLUSION

This work demonstrates the feasibility of using homomorphic encryption to develop a secure and efficient multi-party protocol.

摘要

背景

医学数据共享是生物医学领域的一大挑战,这往往会阻碍合作研究。由于隐私问题,临床笔记不能直接共享。人们已经付出了很多努力来对临床笔记进行去识别化,但要自动准确地定位和清除笔记中所有敏感元素仍然极具挑战性。另一种方法是删除可能包含与个人信息相关的敏感术语的句子。

方法

先前的研究提出了一种基于频率的过滤方法,该方法通过删除包含低频二元组的句子来提高隐私保护,而不会显著降低效用。我们的工作扩展了这种方法,以考虑具有安全和隐私考虑的分布式来源的临床笔记。我们开发了一种新的基于私有集合交集和安全阈值的安全协议,以识别不常见和低频的术语,这些术语可用于指导句子过滤。

结果

由于我们提出的框架的计算成本主要取决于集合交集的基数和数据所有者的数量,因此我们根据这两个因素对框架进行了评估。实验结果表明,我们提出的方法在各种实验设置中是可扩展的。此外,我们还根据数据效用评估了我们的框架。该评估表明,所提出的方法能够保留足够的信息进行数据分析。

结论

这项工作证明了使用同态加密来开发安全高效的多方协议是可行的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/870c/6731605/5fef6601c7e4/12911_2019_867_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/870c/6731605/a7901b09c6c8/12911_2019_867_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/870c/6731605/f29acf894f96/12911_2019_867_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/870c/6731605/5fef6601c7e4/12911_2019_867_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/870c/6731605/a7901b09c6c8/12911_2019_867_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/870c/6731605/f29acf894f96/12911_2019_867_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/870c/6731605/5fef6601c7e4/12911_2019_867_Fig3_HTML.jpg

相似文献

1
A privacy-preserving distributed filtering framework for NLP artifacts.一种用于自然语言处理制品的隐私保护分布式过滤框架。
BMC Med Inform Decis Mak. 2019 Sep 7;19(1):183. doi: 10.1186/s12911-019-0867-z.
2
Revolutionizing Medical Data Sharing Using Advanced Privacy-Enhancing Technologies: Technical, Legal, and Ethical Synthesis.利用先进的隐私增强技术实现医学数据共享的革命:技术、法律和伦理综合。
J Med Internet Res. 2021 Feb 25;23(2):e25120. doi: 10.2196/25120.
3
Quantum private set intersection cardinality based on bloom filter.基于布隆过滤器的量子私有集合交集基数。
Sci Rep. 2021 Aug 30;11(1):17332. doi: 10.1038/s41598-021-96770-1.
4
Towards Secure Big Data Analysis via Fully Homomorphic Encryption Algorithms.通过全同态加密算法实现安全的大数据分析
Entropy (Basel). 2022 Apr 6;24(4):519. doi: 10.3390/e24040519.
5
Privacy-preserving biomedical data dissemination via a hybrid approach.通过混合方法实现的隐私保护生物医学数据传播
AMIA Annu Symp Proc. 2018 Dec 5;2018:1176-1185. eCollection 2018.
6
Distributed clinical data sharing via dynamic access-control policy transformation.通过动态访问控制策略转换实现分布式临床数据共享。
Int J Med Inform. 2016 May;89:25-31. doi: 10.1016/j.ijmedinf.2016.02.002. Epub 2016 Feb 12.
7
Multicenter Privacy-Preserving Cox Analysis Based on Homomorphic Encryption.基于同态加密的多中心隐私保护 Cox 分析。
IEEE J Biomed Health Inform. 2021 Sep;25(9):3310-3320. doi: 10.1109/JBHI.2021.3071270. Epub 2021 Sep 3.
8
Record linkage based patient intersection cardinality for rare disease studies using Mainzelliste and secure multi-party computation.基于 Mainzelliste 和安全多方计算的罕见病研究中基于记录链接的患者交集基数。
J Transl Med. 2022 Oct 8;20(1):458. doi: 10.1186/s12967-022-03671-6.
9
Privacy-preserving data sharing infrastructures for medical research: systematization and comparison.用于医学研究的隐私保护数据共享基础架构:系统梳理与比较。
BMC Med Inform Decis Mak. 2021 Aug 12;21(1):242. doi: 10.1186/s12911-021-01602-x.
10
Towards Secure and Privacy-Preserving Data Sharing in e-Health Systems via Consortium Blockchain.通过联盟区块链实现电子健康系统中的安全和隐私保护数据共享。
J Med Syst. 2018 Jun 28;42(8):140. doi: 10.1007/s10916-018-0995-5.

引用本文的文献

1
Information security implications of using NLP in IT outsourcing: a Diffusion of Innovation theory perspective.信息技术外包中使用自然语言处理的信息安全影响:基于创新扩散理论的视角
Autom Softw Eng. 2021;28(2):12. doi: 10.1007/s10515-021-00286-x. Epub 2021 Jul 16.
2
De-identification of free text data containing personal health information: a scoping review of reviews.去标识化包含个人健康信息的自由文本数据:综述的综述。
Int J Popul Data Sci. 2023 Dec 12;8(1):2153. doi: 10.23889/ijpds.v8i1.2153. eCollection 2023.
3
Privacy-Preserving Deep Learning NLP Models for Cancer Registries.

本文引用的文献

1
SAFETY: Secure gwAs in Federated Environment through a hYbrid Solution.安全性:通过混合解决方案确保联邦环境中的安全 gwAs。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):93-102. doi: 10.1109/TCBB.2018.2829760. Epub 2018 Apr 24.
2
Privacy-Preserving Integration of Medical Data : A Practical Multiparty Private Set Intersection.医疗数据的隐私保护集成:一种实用的多方私有集交集方法
J Med Syst. 2017 Mar;41(3):37. doi: 10.1007/s10916-016-0657-4. Epub 2017 Jan 16.
3
De-identification of patient notes with recurrent neural networks.
用于癌症登记处的隐私保护深度学习自然语言处理模型。
IEEE Trans Emerg Top Comput. 2021 Jul-Sep;9(3):1219-1230. doi: 10.1109/tetc.2020.2983404. Epub 2020 Apr 16.
4
A flexible and stretchable bionic true random number generator.一种灵活可拉伸的仿生真随机数发生器。
Nano Res. 2022;15(5):4448-4456. doi: 10.1007/s12274-022-4109-9. Epub 2022 Mar 8.
5
Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.临床去标识文本的“以明掩暗”抵御人类读者敌对重新识别攻击的弹性。
J Am Med Inform Assoc. 2020 Jul 1;27(9):1374-1382. doi: 10.1093/jamia/ocaa095.
使用递归神经网络对患者记录进行去识别化处理。
J Am Med Inform Assoc. 2017 May 1;24(3):596-606. doi: 10.1093/jamia/ocw156.
4
MIMIC-III, a freely accessible critical care database.MIMIC-III,一个免费获取的重症监护数据库。
Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.
5
Automatic de-identification of textual documents in the electronic health record: a review of recent research.电子健康记录中文本文件的自动去识别:近期研究综述。
BMC Med Res Methodol. 2010 Aug 2;10:70. doi: 10.1186/1471-2288-10-70.
6
What can natural language processing do for clinical decision support?自然语言处理能为临床决策支持做些什么?
J Biomed Inform. 2009 Oct;42(5):760-72. doi: 10.1016/j.jbi.2009.08.007. Epub 2009 Aug 13.
7
Automated de-identification of free-text medical records.自由文本医疗记录的自动去识别化
BMC Med Inform Decis Mak. 2008 Jul 24;8:32. doi: 10.1186/1472-6947-8-32.
8
State-of-the-art anonymization of medical records using an iterative machine learning framework.使用迭代机器学习框架对病历进行最先进的匿名化处理。
J Am Med Inform Assoc. 2007 Sep-Oct;14(5):574-80. doi: 10.1197/j.jamia.M2441.
9
Rapidly retargetable approaches to de-identification in medical records.医疗记录中快速可重新定位的去识别方法。
J Am Med Inform Assoc. 2007 Sep-Oct;14(5):564-73. doi: 10.1197/jamia.M2435. Epub 2007 Jun 28.
10
Development and evaluation of an open source software tool for deidentification of pathology reports.用于病理报告去识别化的开源软件工具的开发与评估
BMC Med Inform Decis Mak. 2006 Mar 6;6:12. doi: 10.1186/1472-6947-6-12.