Department of Computer Science, Technische Universität Darmstadt, 64289 Darmstadt, Germany.
Department of Computer Science, Humboldt-Universität zu Berlin, 10099 Berlin, Germany.
Bioinformatics. 2022 Mar 4;38(6):1657-1668. doi: 10.1093/bioinformatics/btaa764.
Record Linkage has versatile applications in real-world data analysis contexts, where several datasets need to be linked on the record level in the absence of any exact identifier connecting related records. An example are medical databases of patients, spread across institutions, that have to be linked on personally identifiable entries like name, date of birth or ZIP code. At the same time, privacy laws may prohibit the exchange of this personally identifiable information (PII) across institutional boundaries, ruling out the outsourcing of the record linkage task to a trusted third party. We propose to employ privacy-preserving record linkage (PPRL) techniques that prevent, to various degrees, the leakage of PII while still allowing for the linkage of related records.
We develop a framework for fault-tolerant PPRL using secure multi-party computation with the medical record keeping software Mainzelliste as the data source. Our solution does not rely on any trusted third party and all PII is guaranteed to not leak under common cryptographic security assumptions. Benchmarks show the feasibility of our approach in realistic networking settings: linkage of a patient record against a database of 10 000 records can be done in 48 s over a heavily delayed (100 ms) network connection, or 3.9 s with a low-latency connection.
The source code of the sMPC node is freely available on Github at https://github.com/medicalinformatics/SecureEpilinker subject to the AGPLv3 license. The source code of the modified Mainzelliste is available at https://github.com/medicalinformatics/MainzellisteSEL.
Supplementary data are available at Bioinformatics online.
记录链接在现实数据分析上下文中具有广泛的应用,在这种情况下,需要在没有任何连接相关记录的确切标识符的情况下,在记录级别上链接多个数据集。例如,分布在不同机构的患者医疗数据库必须使用个人身份识别项(如姓名、出生日期或邮政编码)进行链接。同时,隐私法可能禁止在机构边界内交换此个人身份信息 (PII),从而排除将记录链接任务外包给受信任的第三方。我们建议采用隐私保护记录链接 (PPRL) 技术,这些技术可以在一定程度上防止 PII 的泄露,同时仍允许链接相关记录。
我们使用 Mainzelliste 作为数据源,开发了一个使用安全多方计算的容错 PPRL 框架。我们的解决方案不依赖任何可信的第三方,并且在常见的密码安全假设下,所有 PII 都保证不会泄露。基准测试表明,我们的方法在现实网络环境中的可行性:在延迟严重(100ms)的网络连接上,针对包含 10,000 条记录的数据库链接一个患者记录需要 48s,而在低延迟连接上则需要 3.9s。
sMPC 节点的源代码可在 Github 上免费获取,网址为 https://github.com/medicalinformatics/SecureEpilinker,受 AGPLv3 许可证的约束。修改后的 Mainzelliste 的源代码可在 https://github.com/medicalinformatics/MainzellisteSEL 上获取。
补充数据可在 Bioinformatics 在线获取。