• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

优化 Mainzelliste 软件以实现快速的隐私保护记录链接。

Optimization of the Mainzelliste software for fast privacy-preserving record linkage.

机构信息

Database Group, University of Leipzig, Leipzig, Germany.

Federated Information Systems, German Cancer Research Center, Heidelberg, Germany.

出版信息

J Transl Med. 2021 Jan 15;19(1):33. doi: 10.1186/s12967-020-02678-1.

DOI:10.1186/s12967-020-02678-1
PMID:33451317
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7809773/
Abstract

BACKGROUND

Data analysis for biomedical research often requires a record linkage step to identify records from multiple data sources referring to the same person. Due to the lack of unique personal identifiers across these sources, record linkage relies on the similarity of personal data such as first and last names or birth dates. However, the exchange of such identifying data with a third party, as is the case in record linkage, is generally subject to strict privacy requirements. This problem is addressed by privacy-preserving record linkage (PPRL) and pseudonymization services. Mainzelliste is an open-source record linkage and pseudonymization service used to carry out PPRL processes in real-world use cases.

METHODS

We evaluate the linkage quality and performance of the linkage process using several real and near-real datasets with different properties w.r.t. size and error-rate of matching records. We conduct a comparison between (plaintext) record linkage and PPRL based on encoded records (Bloom filters). Furthermore, since the Mainzelliste software offers no blocking mechanism, we extend it by phonetic blocking as well as novel blocking schemes based on locality-sensitive hashing (LSH) to improve runtime for both standard and privacy-preserving record linkage.

RESULTS

The Mainzelliste achieves high linkage quality for PPRL using field-level Bloom filters due to the use of an error-tolerant matching algorithm that can handle variances in names, in particular missing or transposed name compounds. However, due to the absence of blocking, the runtimes are unacceptable for real use cases with larger datasets. The newly implemented blocking approaches improve runtimes by orders of magnitude while retaining high linkage quality.

CONCLUSION

We conduct the first comprehensive evaluation of the record linkage facilities of the Mainzelliste software and extend it with blocking methods to improve its runtime. We observed a very high linkage quality for both plaintext as well as encoded data even in the presence of errors. The provided blocking methods provide order of magnitude improvements regarding runtime performance thus facilitating the use in research projects with large datasets and many participants.

摘要

背景

生物医学研究的数据分析通常需要进行记录链接步骤,以识别来自多个数据源的记录,这些记录指的是同一个人。由于这些来源中缺乏唯一的个人标识符,记录链接依赖于个人数据(如名字或出生日期)的相似性。然而,与第三方交换此类识别数据(如记录链接的情况)通常受到严格的隐私要求的限制。隐私保护记录链接(PPRL)和假名化服务解决了这个问题。Mainzelliste 是一个开源的记录链接和假名化服务,用于在实际用例中执行 PPRL 过程。

方法

我们使用具有不同属性的几个真实和近真实数据集来评估链接过程的链接质量和性能,这些数据集在记录匹配的大小和错误率方面有所不同。我们在基于编码记录(布隆过滤器)的记录链接和基于隐私保护的记录链接(Bloom 过滤器)之间进行了比较。此外,由于 Mainzelliste 软件没有提供阻止机制,我们通过语音阻止以及基于局部敏感哈希(LSH)的新阻止方案扩展了它,以提高标准和隐私保护记录链接的运行时。

结果

Mainzelliste 使用字段级别的 Bloom 过滤器实现了高的 PPRL 链接质量,因为它使用了一种容错的匹配算法,可以处理名称中的差异,特别是缺少或错位的名字组合。然而,由于缺乏阻止机制,对于使用更大数据集的实际用例,运行时间是不可接受的。新实现的阻止方法将运行时间提高了几个数量级,同时保持了高的链接质量。

结论

我们对 Mainzelliste 软件的记录链接功能进行了首次全面评估,并通过阻止方法对其进行了扩展,以提高其运行时性能。我们观察到,即使在存在错误的情况下,明文和编码数据的链接质量都非常高。提供的阻止方法在运行时性能方面提供了数量级的改进,从而促进了在具有大数据集和大量参与者的研究项目中的使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4d7/7809773/a1188a799677/12967_2020_2678_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4d7/7809773/1aa19e9d8ad9/12967_2020_2678_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4d7/7809773/15591b27466f/12967_2020_2678_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4d7/7809773/afb016d5f269/12967_2020_2678_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4d7/7809773/5f0d5c25aa06/12967_2020_2678_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4d7/7809773/42c7959ebf3a/12967_2020_2678_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4d7/7809773/a1188a799677/12967_2020_2678_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4d7/7809773/1aa19e9d8ad9/12967_2020_2678_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4d7/7809773/15591b27466f/12967_2020_2678_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4d7/7809773/afb016d5f269/12967_2020_2678_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4d7/7809773/5f0d5c25aa06/12967_2020_2678_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4d7/7809773/42c7959ebf3a/12967_2020_2678_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4d7/7809773/a1188a799677/12967_2020_2678_Fig6_HTML.jpg

相似文献

1
Optimization of the Mainzelliste software for fast privacy-preserving record linkage.优化 Mainzelliste 软件以实现快速的隐私保护记录链接。
J Transl Med. 2021 Jan 15;19(1):33. doi: 10.1186/s12967-020-02678-1.
2
Mainzelliste SecureEpiLinker (MainSEL): privacy-preserving record linkage using secure multi-party computation. Mainzelliste SecureEpiLinker (MainSEL):使用安全多方计算进行隐私保护的记录链接。
Bioinformatics. 2022 Mar 4;38(6):1657-1668. doi: 10.1093/bioinformatics/btaa764.
3
A blinded evaluation of privacy preserving record linkage with Bloom filters.使用布隆过滤器进行隐私保护记录链接的盲评估。
BMC Med Res Methodol. 2022 Jan 16;22(1):22. doi: 10.1186/s12874-022-01510-2.
4
Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets.在大型医学数据集上使用加密长期密钥和多位树评估隐私保护记录链接。
BMC Med Inform Decis Mak. 2017 Jun 8;17(1):83. doi: 10.1186/s12911-017-0478-5.
5
Record linkage based patient intersection cardinality for rare disease studies using Mainzelliste and secure multi-party computation.基于 Mainzelliste 和安全多方计算的罕见病研究中基于记录链接的患者交集基数。
J Transl Med. 2022 Oct 8;20(1):458. doi: 10.1186/s12967-022-03671-6.
6
A Federated Record Linkage Algorithm for Secure Medical Data Sharing.一种用于安全医疗数据共享的联邦记录链接算法。
Stud Health Technol Inform. 2021 May 24;278:142-149. doi: 10.3233/SHTI210062.
7
Encoding of Numerical Data for Privacy-Preserving Record Linkage.用于隐私保护记录链接的数值数据编码
Stud Health Technol Inform. 2020 Jun 23;271:23-30. doi: 10.3233/SHTI200070.
8
On the effectiveness of graph matching attacks against privacy-preserving record linkage.图匹配攻击对隐私保护记录链接有效性的研究。
PLoS One. 2022 Sep 22;17(9):e0267893. doi: 10.1371/journal.pone.0267893. eCollection 2022.
9
Designing an algorithm to preserve privacy for medical record linkage with error-prone data.设计一种算法,在存在错误数据的情况下保护医疗记录链接的隐私。
JMIR Med Inform. 2014 Jan 20;2(1):e2. doi: 10.2196/medinform.3090.
10
Privacy preserving probabilistic record linkage (P3RL): a novel method for linking existing health-related data and maintaining participant confidentiality.隐私保护概率性记录链接(P3RL):一种链接现有健康相关数据并维护参与者隐私的新方法。
BMC Med Res Methodol. 2015 May 30;15:46. doi: 10.1186/s12874-015-0038-6.

引用本文的文献

1
Potential Harms of Feedback After Web-Based Depression Screening: Secondary Analysis of Negative Effects in the Randomized Controlled DISCOVER Trial.基于网络的抑郁症筛查后反馈的潜在危害:随机对照DISCOVER试验中负面影响的二次分析。
J Med Internet Res. 2025 Apr 30;27:e59476. doi: 10.2196/59476.
2
Record linkage based patient intersection cardinality for rare disease studies using Mainzelliste and secure multi-party computation.基于 Mainzelliste 和安全多方计算的罕见病研究中基于记录链接的患者交集基数。
J Transl Med. 2022 Oct 8;20(1):458. doi: 10.1186/s12967-022-03671-6.
3
The efficacy of automated feedback after internet-based depression screening: Study protocol of the German, three-armed, randomised controlled trial DISCOVER.

本文引用的文献

1
An Architecture for Translational Cancer Research As Exemplified by the German Cancer Consortium.以德国癌症联盟为例的转化性癌症研究架构
JCO Clin Cancer Inform. 2018 Dec;2:1-8. doi: 10.1200/CCI.17.00062.
基于互联网的抑郁症筛查后自动反馈的效果:德国三臂随机对照试验DISCOVER的研究方案
Internet Interv. 2021 Jul 21;25:100435. doi: 10.1016/j.invent.2021.100435. eCollection 2021 Sep.