• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于隐私保护分布式统计计算的水平分区健康数据的安全且可扩展的重复数据删除

Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation.

作者信息

Yigzaw Kassaye Yitbarek, Michalas Antonis, Bellika Johan Gustav

机构信息

Department of Computer Science, UiT The Arctic University of Norway, 9037, Tromsø, Norway.

Norwegian Centre for E-health Research, University Hospital of North Norway, 9019, Tromsø, Norway.

出版信息

BMC Med Inform Decis Mak. 2017 Jan 3;17(1):1. doi: 10.1186/s12911-016-0389-x.

DOI:10.1186/s12911-016-0389-x
PMID:28049465
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5209873/
Abstract

BACKGROUND

Techniques have been developed to compute statistics on distributed datasets without revealing private information except the statistical results. However, duplicate records in a distributed dataset may lead to incorrect statistical results. Therefore, to increase the accuracy of the statistical analysis of a distributed dataset, secure deduplication is an important preprocessing step.

METHODS

We designed a secure protocol for the deduplication of horizontally partitioned datasets with deterministic record linkage algorithms. We provided a formal security analysis of the protocol in the presence of semi-honest adversaries. The protocol was implemented and deployed across three microbiology laboratories located in Norway, and we ran experiments on the datasets in which the number of records for each laboratory varied. Experiments were also performed on simulated microbiology datasets and data custodians connected through a local area network.

RESULTS

The security analysis demonstrated that the protocol protects the privacy of individuals and data custodians under a semi-honest adversarial model. More precisely, the protocol remains secure with the collusion of up to N - 2 corrupt data custodians. The total runtime for the protocol scales linearly with the addition of data custodians and records. One million simulated records distributed across 20 data custodians were deduplicated within 45 s. The experimental results showed that the protocol is more efficient and scalable than previous protocols for the same problem.

CONCLUSIONS

The proposed deduplication protocol is efficient and scalable for practical uses while protecting the privacy of patients and data custodians.

摘要

背景

已开发出一些技术,用于在不泄露除统计结果之外的私人信息的情况下,对分布式数据集进行统计计算。然而,分布式数据集中的重复记录可能会导致统计结果不正确。因此,为提高分布式数据集统计分析的准确性,安全去重是一个重要的预处理步骤。

方法

我们使用确定性记录链接算法,为水平分区数据集的去重设计了一种安全协议。我们在存在半诚实对手的情况下,对该协议进行了形式化安全分析。该协议在位于挪威的三个微生物实验室中实现并部署,我们在每个实验室记录数量不同的数据集上进行了实验。还对通过局域网连接的模拟微生物数据集和数据保管人进行了实验。

结果

安全分析表明,该协议在半诚实对抗模型下保护了个人和数据保管人的隐私。更确切地说,该协议在多达N - 2个腐败数据保管人勾结的情况下仍保持安全。该协议的总运行时间随着数据保管人和记录的增加呈线性扩展。分布在20个数据保管人之间的100万条模拟记录在45秒内完成了去重。实验结果表明,对于同一问题,该协议比以前的协议更高效、更具可扩展性。

结论

所提出的去重协议在保护患者和数据保管人隐私的同时,对于实际应用而言高效且具有可扩展性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd13/5209873/711dcc383dcd/12911_2016_389_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd13/5209873/fe36610c64ca/12911_2016_389_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd13/5209873/036b284799b7/12911_2016_389_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd13/5209873/96563d553204/12911_2016_389_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd13/5209873/36b467e87aa6/12911_2016_389_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd13/5209873/943316dbe509/12911_2016_389_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd13/5209873/e926543b0161/12911_2016_389_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd13/5209873/903627c10fc7/12911_2016_389_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd13/5209873/711dcc383dcd/12911_2016_389_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd13/5209873/fe36610c64ca/12911_2016_389_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd13/5209873/036b284799b7/12911_2016_389_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd13/5209873/96563d553204/12911_2016_389_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd13/5209873/36b467e87aa6/12911_2016_389_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd13/5209873/943316dbe509/12911_2016_389_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd13/5209873/e926543b0161/12911_2016_389_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd13/5209873/903627c10fc7/12911_2016_389_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd13/5209873/711dcc383dcd/12911_2016_389_Fig8_HTML.jpg

相似文献

1
Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation.用于隐私保护分布式统计计算的水平分区健康数据的安全且可扩展的重复数据删除
BMC Med Inform Decis Mak. 2017 Jan 3;17(1):1. doi: 10.1186/s12911-016-0389-x.
2
Privacy-preserving record linkage in large databases using secure multiparty computation.使用安全多方计算在大型数据库中进行隐私保护的记录链接。
BMC Med Genomics. 2018 Oct 11;11(Suppl 4):84. doi: 10.1186/s12920-018-0400-8.
3
Privacy-preserving Statistical Query and Processing on Distributed OpenEHR Data.分布式开放电子健康记录(OpenEHR)数据的隐私保护统计查询与处理
Stud Health Technol Inform. 2015;210:766-70.
4
Limited privacy protection and poor sensitivity: Is it time to move on from the statistical linkage key-581?有限的隐私保护和较差的敏感性:是时候摒弃统计链接密钥581了吗?
Health Inf Manag. 2016 Aug;45(2):71-9. doi: 10.1177/1833358316647587. Epub 2016 May 13.
5
Privacy-preserving record linkage on large real world datasets.在大型真实世界数据集上进行隐私保护记录链接。
J Biomed Inform. 2014 Aug;50:205-12. doi: 10.1016/j.jbi.2013.12.003. Epub 2013 Dec 9.
6
Design and implementation of a privacy preserving electronic health record linkage tool in Chicago.芝加哥一种隐私保护电子健康记录链接工具的设计与实现
J Am Med Inform Assoc. 2015 Sep;22(5):1072-80. doi: 10.1093/jamia/ocv038. Epub 2015 Jun 23.
7
Privacy preserving probabilistic record linkage (P3RL): a novel method for linking existing health-related data and maintaining participant confidentiality.隐私保护概率性记录链接(P3RL):一种链接现有健康相关数据并维护参与者隐私的新方法。
BMC Med Res Methodol. 2015 May 30;15:46. doi: 10.1186/s12874-015-0038-6.
8
Some methods for blindfolded record linkage.一些用于盲态记录链接的方法。
BMC Med Inform Decis Mak. 2004 Jun 28;4:9. doi: 10.1186/1472-6947-4-9.
9
Matching study to registry data: maintaining data privacy in a study on family based colorectal cancer.将研究与登记数据进行匹配:在一项基于家庭的结直肠癌研究中维护数据隐私
Stud Health Technol Inform. 2014;205:808-12.
10
Establishing a framework for privacy-preserving record linkage among electronic health record and administrative claims databases within PCORnet, the National Patient-Centered Clinical Research Network.在 PCORnet(国家以患者为中心的临床研究网络)内的电子健康记录和行政索赔数据库中建立隐私保护记录链接的框架。
BMC Res Notes. 2022 Oct 31;15(1):337. doi: 10.1186/s13104-022-06243-5.

引用本文的文献

1
Supporting Awareness of Dynamic Data: Approaches to Designing and Capturing Data within Interactive Clinical Checklists.支持对动态数据的认知:交互式临床检查表中数据的设计与获取方法
DIS (Des Interact Syst Conf). 2023 Jul;2023:1293-1308. doi: 10.1145/3563657.3595979. Epub 2023 Jul 10.
2
User-centred design of ChestCare: mHealth app for pulmonary rehabilitation for patients with COPD; a mixed-methods sequential approach.以患者为中心的ChestCare设计:用于慢性阻塞性肺疾病患者肺康复的移动健康应用程序;一种混合方法的序贯研究方法。
Digit Health. 2025 Jan 17;11:20552076241307476. doi: 10.1177/20552076241307476. eCollection 2025 Jan-Dec.
3

本文引用的文献

1
Federated queries of clinical data repositories: Scaling to a national network.临床数据存储库的联合查询:扩展至全国性网络。
J Biomed Inform. 2015 Jun;55:231-6. doi: 10.1016/j.jbi.2015.04.012. Epub 2015 May 6.
2
Composite Bloom Filters for Secure Record Linkage.用于安全记录链接的复合布隆过滤器
IEEE Trans Knowl Data Eng. 2014 Dec;26(12):2956-2968. doi: 10.1109/TKDE.2013.91.
3
Clinical research informatics and electronic health record data.临床研究信息学与电子健康记录数据。
Using an SMS to improve bowel cancer screening: the acceptability and feasibility of a multifaceted intervention.
利用短信改善肠癌筛查:多方面干预措施的可接受性与可行性
Fam Pract. 2025 Jan 17;42(1). doi: 10.1093/fampra/cmae073.
4
A Privacy-Preserving Audit and Feedback System for the Antibiotic Prescribing of General Practitioners: Survey Study.全科医生抗生素处方的隐私保护审计与反馈系统:调查研究
JMIR Form Res. 2022 Jul 13;6(7):e31650. doi: 10.2196/31650.
5
Privacy-preserving data sharing infrastructures for medical research: systematization and comparison.用于医学研究的隐私保护数据共享基础架构:系统梳理与比较。
BMC Med Inform Decis Mak. 2021 Aug 12;21(1):242. doi: 10.1186/s12911-021-01602-x.
6
Balancing Accuracy and Privacy in Federated Queries of Clinical Data Repositories: Algorithm Development and Validation.联邦查询临床数据存储库中的准确性和隐私平衡:算法的开发和验证。
J Med Internet Res. 2020 Nov 3;22(11):e18735. doi: 10.2196/18735.
7
Fold-stratified cross-validation for unbiased and privacy-preserving federated learning.无偏隐私保护联邦学习的折叠分层交叉验证。
J Am Med Inform Assoc. 2020 Aug 1;27(8):1244-1251. doi: 10.1093/jamia/ocaa096.
8
Privacy-preserving architecture for providing feedback to clinicians on their clinical performance.保护隐私的架构,用于向临床医生提供其临床绩效的反馈。
BMC Med Inform Decis Mak. 2020 Jun 22;20(1):116. doi: 10.1186/s12911-020-01147-5.
Yearb Med Inform. 2014 Aug 15;9(1):215-23. doi: 10.15265/IY-2014-0009.
4
"Big data" and the electronic health record.“大数据”与电子健康记录
Yearb Med Inform. 2014 Aug 15;9(1):97-104. doi: 10.15265/IY-2014-0003.
5
Clinical research data warehouse governance for distributed research networks in the USA: a systematic review of the literature.美国分布式研究网络的临床研究数据仓库治理:文献系统评价。
J Am Med Inform Assoc. 2014 Jul-Aug;21(4):730-6. doi: 10.1136/amiajnl-2013-002370. Epub 2014 Mar 28.
6
Health data use, stewardship, and governance: ongoing gaps and challenges: a report from AMIA's 2012 Health Policy Meeting.健康数据的使用、管理和治理:持续存在的差距和挑战:来自 AMIA 2012 年健康政策会议的报告。
J Am Med Inform Assoc. 2014 Mar-Apr;21(2):204-11. doi: 10.1136/amiajnl-2013-002117. Epub 2013 Oct 29.
7
The effect of data cleaning on record linkage quality.数据清洗对记录匹配质量的影响。
BMC Med Inform Decis Mak. 2013 Jun 5;13:64. doi: 10.1186/1472-6947-13-64.
8
Federated queries of clinical data repositories: the sum of the parts does not equal the whole.联邦查询临床数据存储库:部分之和不等于整体。
J Am Med Inform Assoc. 2013 Jun;20(e1):e155-61. doi: 10.1136/amiajnl-2012-001299. Epub 2013 Jan 24.
9
An evaluation of the rates of repeat notifiable disease reporting and patient crossover using a health information exchange-based automated electronic laboratory reporting system.使用基于健康信息交换的自动化电子实验室报告系统对法定传染病重复报告率和患者交叉情况进行评估。
AMIA Annu Symp Proc. 2012;2012:1229-36. Epub 2012 Nov 3.
10
A glimpse of the next 100 years in medicine.医学未来百年一瞥。
N Engl J Med. 2012 Dec 27;367(26):2538-9. doi: 10.1056/NEJMe1213371.