• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过分区实现并行隐私保护(P4):一种用于健康数据的可扩展数据匿名化算法。

Parallel privacy preservation through partitioning (P4): a scalable data anonymization algorithm for health data.

作者信息

Halilovic Mehmed, Meurers Thierry, Otte Karen, Prasser Fabian

机构信息

Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Medical Informatics Group, Charitéplatz 1, 10117, Berlin, Germany.

出版信息

BMC Med Inform Decis Mak. 2025 Mar 12;25(1):129. doi: 10.1186/s12911-025-02959-z.

DOI:10.1186/s12911-025-02959-z
PMID:40075355
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11905666/
Abstract

BACKGROUND

Sharing health data holds great potential for advancing medical research but also poses many challenges, including the need to protect people's privacy. One approach to address this is data anonymization, which refers to the process of altering or transforming a dataset to preserve the privacy of the individuals contributing data. To this, privacy models have been designed to measure risks and optimization algorithms can be used to transform data to achieve a good balance between risks reduction and the preservation of the dataset's utility. However, this process is computationally complex and challenging to apply to large datasets. Previously suggested parallel algorithms have been tailored to specific risk models, utility models and transformation methods.

METHODS

We present a novel parallel algorithm that supports a wide range of methods for measuring risks, optimizing utility and transforming data. The algorithm trades data utility for parallelization, by anonymizing partitions of the dataset in parallel. To ensure the correctness of the anonymization process, the algorithm carefully controls the process and if needed rearranges partitions and performs additional transformations.

RESULTS

We demonstrate the effectiveness of our method through an open-source implementation. Our experiments show that our approach can reduce execution times by up to one order of magnitude with minor impacts on output data utility in a wide range of scenarios.

CONCLUSIONS

Our novel P4 algorithm for parallel and distributed data anonymization is, to the best of our knowledge, the first to systematically support a wide variety of privacy, transformation and utility models.

摘要

背景

共享健康数据在推动医学研究方面具有巨大潜力,但也带来了诸多挑战,其中包括保护人们隐私的必要性。解决这一问题的一种方法是数据匿名化,它指的是对数据集进行更改或转换的过程,以保护贡献数据的个人隐私。为此,已经设计了隐私模型来衡量风险,并且可以使用优化算法来转换数据,以在降低风险和保持数据集效用之间实现良好平衡。然而,这个过程计算复杂,难以应用于大型数据集。先前提出的并行算法是针对特定的风险模型、效用模型和转换方法量身定制的。

方法

我们提出了一种新颖的并行算法,该算法支持多种用于衡量风险、优化效用和转换数据的方法。该算法通过并行匿名化数据集的分区,以牺牲数据效用为代价来实现并行化。为确保匿名化过程的正确性,该算法仔细控制过程,并在需要时重新排列分区并执行额外的转换。

结果

我们通过开源实现展示了我们方法的有效性。我们的实验表明,在广泛的场景中,我们的方法可以将执行时间减少多达一个数量级,同时对输出数据效用的影响较小。

结论

据我们所知,我们用于并行和分布式数据匿名化的新颖P4算法是第一个系统支持多种隐私、转换和效用模型的算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/274f/11905666/f4b1dcc0cc8a/12911_2025_2959_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/274f/11905666/8f11ff7af5ec/12911_2025_2959_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/274f/11905666/ea83e7b86522/12911_2025_2959_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/274f/11905666/373297b79dff/12911_2025_2959_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/274f/11905666/6bd4c1fe4bdb/12911_2025_2959_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/274f/11905666/f4b1dcc0cc8a/12911_2025_2959_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/274f/11905666/8f11ff7af5ec/12911_2025_2959_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/274f/11905666/ea83e7b86522/12911_2025_2959_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/274f/11905666/373297b79dff/12911_2025_2959_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/274f/11905666/6bd4c1fe4bdb/12911_2025_2959_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/274f/11905666/f4b1dcc0cc8a/12911_2025_2959_Fig5_HTML.jpg

相似文献

1
Parallel privacy preservation through partitioning (P4): a scalable data anonymization algorithm for health data.通过分区实现并行隐私保护(P4):一种用于健康数据的可扩展数据匿名化算法。
BMC Med Inform Decis Mak. 2025 Mar 12;25(1):129. doi: 10.1186/s12911-025-02959-z.
2
The cost of quality: Implementing generalization and suppression for anonymizing biomedical data with minimal information loss.质量成本:在信息损失最小化的情况下,对生物医学数据进行匿名化处理时实施泛化和抑制。
J Biomed Inform. 2015 Dec;58:37-48. doi: 10.1016/j.jbi.2015.09.007. Epub 2015 Sep 15.
3
The Costs of Anonymization: Case Study Using Clinical Data.匿名化的成本:使用临床数据的案例研究
J Med Internet Res. 2024 Apr 24;26:e49445. doi: 10.2196/49445.
4
A scalable software solution for anonymizing high-dimensional biomedical data.一种可扩展的软件解决方案,用于对高维生物医学数据进行匿名化处理。
Gigascience. 2021 Oct 4;10(10). doi: 10.1093/gigascience/giab068.
5
Exploring the tradeoff between data privacy and utility with a clinical data analysis use case.探讨临床数据分析用例中数据隐私与效用之间的权衡。
BMC Med Inform Decis Mak. 2024 May 30;24(1):147. doi: 10.1186/s12911-024-02545-9.
6
Utility-preserving anonymization for health data publishing.用于健康数据发布的效用保持匿名化
BMC Med Inform Decis Mak. 2017 Jul 11;17(1):104. doi: 10.1186/s12911-017-0499-0.
7
Better Safe than Sorry - Implementing Reliable Health Data Anonymization.安全总比遗憾好——实施可靠的健康数据匿名化
Stud Health Technol Inform. 2020 Jun 16;270:68-72. doi: 10.3233/SHTI200124.
8
Privacy preserving data anonymization of spontaneous ADE reporting system dataset.自发不良药物事件报告系统数据集的隐私保护数据匿名化
BMC Med Inform Decis Mak. 2016 Jul 18;16 Suppl 1(Suppl 1):58. doi: 10.1186/s12911-016-0293-4.
9
A multi-institution evaluation of clinical profile anonymization.多机构临床资料匿名化评估
J Am Med Inform Assoc. 2016 Apr;23(e1):e131-7. doi: 10.1093/jamia/ocv154. Epub 2015 Nov 13.
10
Privacy of Study Participants in Open-access Health and Demographic Surveillance System Data: Requirements Analysis for Data Anonymization.开放获取健康和人口监测系统数据中研究参与者的隐私:数据匿名化的需求分析。
JMIR Public Health Surveill. 2022 Sep 2;8(9):e34472. doi: 10.2196/34472.

本文引用的文献

1
A scoping review of privacy and utility metrics in medical synthetic data.医学合成数据中隐私与效用指标的范围综述。
NPJ Digit Med. 2025 Jan 27;8(1):60. doi: 10.1038/s41746-024-01359-3.
2
Anonymization: The imperfect science of using data while preserving privacy.匿名化:在保护隐私的同时使用数据的不完美科学。
Sci Adv. 2024 Jul 19;10(29):eadn7053. doi: 10.1126/sciadv.adn7053. Epub 2024 Jul 17.
3
The Costs of Anonymization: Case Study Using Clinical Data.匿名化的成本:使用临床数据的案例研究
J Med Internet Res. 2024 Apr 24;26:e49445. doi: 10.2196/49445.
4
A Scalable Pseudonymization Tool for Rapid Deployment in Large Biomedical Research Networks: Development and Evaluation Study.一种可扩展的假名化工具,用于在大型生物医学研究网络中快速部署:开发与评估研究
JMIR Med Inform. 2024 Apr 23;12:e49646. doi: 10.2196/49646.
5
Secondary data for global health digitalisation.全球卫生数字化的二手数据。
Lancet Digit Health. 2023 Feb;5(2):e93-e101. doi: 10.1016/S2589-7500(22)00195-9.
6
A systematic review of homomorphic encryption and its contributions in healthcare industry.同态加密及其在医疗行业贡献的系统综述。
Complex Intell Systems. 2022 May 3:1-28. doi: 10.1007/s40747-022-00756-z.
7
A Review of Anonymization for Healthcare Data.医疗保健数据匿名化综述
Big Data. 2024 Dec;12(6):538-555. doi: 10.1089/big.2021.0169. Epub 2022 Mar 10.
8
A scalable software solution for anonymizing high-dimensional biomedical data.一种可扩展的软件解决方案,用于对高维生物医学数据进行匿名化处理。
Gigascience. 2021 Oct 4;10(10). doi: 10.1093/gigascience/giab068.
9
Big data and machine learning algorithms for health-care delivery.大数据和机器学习算法在医疗中的应用。
Lancet Oncol. 2019 May;20(5):e262-e273. doi: 10.1016/S1470-2045(19)30149-4.
10
An Open Source Tool for Game Theoretic Health Data De-Identification.一种用于博弈论健康数据去识别的开源工具。
AMIA Annu Symp Proc. 2018 Apr 16;2017:1430-1439. eCollection 2017.