Suppr超能文献

使用安全多方计算在大型数据库中进行隐私保护的记录链接。

Privacy-preserving record linkage in large databases using secure multiparty computation.

机构信息

Cybernetica AS, Ülikooli 2, Tartu, 51003, Estonia.

STACC, Ülikooli 2, Tartu, 51003, Estonia.

出版信息

BMC Med Genomics. 2018 Oct 11;11(Suppl 4):84. doi: 10.1186/s12920-018-0400-8.

Abstract

BACKGROUND

Practical applications for data analysis may require combining multiple databases belonging to different owners, such as health centers. The analysis should be performed without violating privacy of neither the centers themselves, nor the patients whose records these centers store. To avoid biased analysis results, it may be important to remove duplicate records among the centers, so that each patient's data would be taken into account only once. This task is very closely related to privacy-preserving record linkage.

METHODS

This paper presents a solution to privacy-preserving deduplication among records of several databases using secure multiparty computation. It is build upon one of the fastest practical secure multiparty computation platforms, called Sharemind.

RESULTS

The tests on ca 10 million records of simulated databases with 1000 health centers of 10000 records each show that the computation is feasible in practice. The expected running time of the experiment is ca. 30 min for computing servers connected over 100 Mbit/s WAN, the expected error of the results is 2, and no errors have been detected for the particular test set that we used for our benchmarks.

CONCLUSIONS

The solution is ready for practical use. It has well-defined security properties, implied by the properties of Sharemind platform. The solution assumes that exact matching of records is required, and a possible future research would be extending it to approximate matching.

摘要

背景

数据分析的实际应用可能需要结合属于不同所有者的多个数据库,例如健康中心。分析不应侵犯中心本身或存储这些中心记录的患者的隐私。为避免分析结果出现偏差,可能重要的是要删除中心之间的重复记录,以便仅考虑每个患者的数据一次。此任务与保护隐私的记录链接非常密切。

方法

本文提出了一种使用安全多方计算在多个数据库的记录之间进行隐私保护去重的解决方案。它建立在最快的实用安全多方计算平台之一Sharemind 之上。

结果

对具有 1000 个记录的 1000 个健康中心的模拟数据库中的约 1000 万条记录进行的测试表明,该计算在实践中是可行的。对于通过 100 Mbit/s WAN 连接的计算服务器,预计的实验运行时间约为 30 分钟,结果的预期误差为 2,并且对于我们用于基准测试的特定测试集未检测到任何错误。

结论

该解决方案已准备好实际使用。它具有由 Sharemind 平台的属性隐含的明确定义的安全属性。该解决方案假设需要精确匹配记录,并且未来的一项研究可能是将其扩展到近似匹配。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/25cc/6180364/5e7b66341419/12920_2018_400_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验