Suppr超能文献

用于隐私保护分布式统计计算的水平分区健康数据的安全且可扩展的重复数据删除

Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation.

作者信息

Yigzaw Kassaye Yitbarek, Michalas Antonis, Bellika Johan Gustav

机构信息

Department of Computer Science, UiT The Arctic University of Norway, 9037, Tromsø, Norway.

Norwegian Centre for E-health Research, University Hospital of North Norway, 9019, Tromsø, Norway.

出版信息

BMC Med Inform Decis Mak. 2017 Jan 3;17(1):1. doi: 10.1186/s12911-016-0389-x.

Abstract

BACKGROUND

Techniques have been developed to compute statistics on distributed datasets without revealing private information except the statistical results. However, duplicate records in a distributed dataset may lead to incorrect statistical results. Therefore, to increase the accuracy of the statistical analysis of a distributed dataset, secure deduplication is an important preprocessing step.

METHODS

We designed a secure protocol for the deduplication of horizontally partitioned datasets with deterministic record linkage algorithms. We provided a formal security analysis of the protocol in the presence of semi-honest adversaries. The protocol was implemented and deployed across three microbiology laboratories located in Norway, and we ran experiments on the datasets in which the number of records for each laboratory varied. Experiments were also performed on simulated microbiology datasets and data custodians connected through a local area network.

RESULTS

The security analysis demonstrated that the protocol protects the privacy of individuals and data custodians under a semi-honest adversarial model. More precisely, the protocol remains secure with the collusion of up to N - 2 corrupt data custodians. The total runtime for the protocol scales linearly with the addition of data custodians and records. One million simulated records distributed across 20 data custodians were deduplicated within 45 s. The experimental results showed that the protocol is more efficient and scalable than previous protocols for the same problem.

CONCLUSIONS

The proposed deduplication protocol is efficient and scalable for practical uses while protecting the privacy of patients and data custodians.

摘要

背景

已开发出一些技术,用于在不泄露除统计结果之外的私人信息的情况下,对分布式数据集进行统计计算。然而,分布式数据集中的重复记录可能会导致统计结果不正确。因此,为提高分布式数据集统计分析的准确性,安全去重是一个重要的预处理步骤。

方法

我们使用确定性记录链接算法,为水平分区数据集的去重设计了一种安全协议。我们在存在半诚实对手的情况下,对该协议进行了形式化安全分析。该协议在位于挪威的三个微生物实验室中实现并部署,我们在每个实验室记录数量不同的数据集上进行了实验。还对通过局域网连接的模拟微生物数据集和数据保管人进行了实验。

结果

安全分析表明,该协议在半诚实对抗模型下保护了个人和数据保管人的隐私。更确切地说,该协议在多达N - 2个腐败数据保管人勾结的情况下仍保持安全。该协议的总运行时间随着数据保管人和记录的增加呈线性扩展。分布在20个数据保管人之间的100万条模拟记录在45秒内完成了去重。实验结果表明,对于同一问题,该协议比以前的协议更高效、更具可扩展性。

结论

所提出的去重协议在保护患者和数据保管人隐私的同时,对于实际应用而言高效且具有可扩展性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bd13/5209873/fe36610c64ca/12911_2016_389_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验