• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于聚类的匿名数据实用驱动评估。

Utility-driven assessment of anonymized data via clustering.

机构信息

Universidade da Beira Interior, Covilha, Portugal and CEMAPRE, Lisboa, Portugal.

Universidade da Beira Interior and Instituto de Telecomunicações (IT-UBI), Covilha, Portugal.

出版信息

Sci Data. 2022 Jul 30;9(1):456. doi: 10.1038/s41597-022-01561-6.

DOI:10.1038/s41597-022-01561-6
PMID:35907927
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9339002/
Abstract

In this study, clustering is conceived as an auxiliary tool to identify groups of special interest. This approach was applied to a real dataset concerning an entire Portuguese cohort of higher education Law students. Several anonymized clustering scenarios were compared against the original cluster solution. The clustering techniques were explored as data utility models in the context of data anonymization, using k-anonymity and (ε, δ)-differential as privacy models. The purpose was to assess anonymized data utility by standard metrics, by the characteristics of the groups obtained, and the relative risk (a relevant metric in social sciences research). For a matter of self-containment, we present an overview of anonymization and clustering methods. We used a partitional clustering algorithm and analyzed several clustering validity indices to understand to what extent the data structure is preserved, or not, after data anonymization. The results suggest that for low dimensionality/cardinality datasets the anonymization procedure easily jeopardizes the clustering endeavor. In addition, there is evidence that relevant field-of-study estimates obtained from anonymized data are biased.

摘要

在这项研究中,聚类被视为一种辅助工具,用于识别特殊兴趣群体。这种方法应用于一个关于整个葡萄牙高等教育法律学生队列的真实数据集。针对原始聚类解决方案,对几个匿名聚类场景进行了比较。在数据匿名化的上下文中,使用 k-匿名和 (ε, δ)-差分作为隐私模型,探索了聚类技术作为数据实用模型。目的是通过标准指标、获得的组的特征以及相对风险(社会科学研究中的一个相关指标)来评估匿名数据的实用性。为了自成一体,我们对匿名化和聚类方法进行了概述。我们使用了分区聚类算法,并分析了几个聚类有效性指标,以了解在数据匿名化后,数据结构在多大程度上得到了保留或未得到保留。结果表明,对于低维/基数数据集,匿名化过程很容易危及聚类工作。此外,有证据表明,从匿名数据中获得的相关研究领域估计值存在偏差。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81f4/9339002/b0a56c871da2/41597_2022_1561_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81f4/9339002/88ccb4547c11/41597_2022_1561_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81f4/9339002/743975c31477/41597_2022_1561_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81f4/9339002/34194b2d069f/41597_2022_1561_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81f4/9339002/c6f373fb6cd0/41597_2022_1561_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81f4/9339002/eb38fcb5b962/41597_2022_1561_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81f4/9339002/b465ec77ffb3/41597_2022_1561_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81f4/9339002/b0a56c871da2/41597_2022_1561_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81f4/9339002/88ccb4547c11/41597_2022_1561_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81f4/9339002/743975c31477/41597_2022_1561_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81f4/9339002/34194b2d069f/41597_2022_1561_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81f4/9339002/c6f373fb6cd0/41597_2022_1561_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81f4/9339002/eb38fcb5b962/41597_2022_1561_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81f4/9339002/b465ec77ffb3/41597_2022_1561_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/81f4/9339002/b0a56c871da2/41597_2022_1561_Fig7_HTML.jpg

相似文献

1
Utility-driven assessment of anonymized data via clustering.基于聚类的匿名数据实用驱动评估。
Sci Data. 2022 Jul 30;9(1):456. doi: 10.1038/s41597-022-01561-6.
2
Privacy preserving data anonymization of spontaneous ADE reporting system dataset.自发不良药物事件报告系统数据集的隐私保护数据匿名化
BMC Med Inform Decis Mak. 2016 Jul 18;16 Suppl 1(Suppl 1):58. doi: 10.1186/s12911-016-0293-4.
3
The Costs of Anonymization: Case Study Using Clinical Data.匿名化的成本:使用临床数据的案例研究
J Med Internet Res. 2024 Apr 24;26:e49445. doi: 10.2196/49445.
4
Utility-preserving anonymization for health data publishing.用于健康数据发布的效用保持匿名化
BMC Med Inform Decis Mak. 2017 Jul 11;17(1):104. doi: 10.1186/s12911-017-0499-0.
5
Utility-Preserving Anonymization in a Real-World Scenario: Evidence from the German Chronic Kidney Disease (GCKD) Study.实用匿名化在真实场景中的应用:来自德国慢性肾脏病(GCKD)研究的证据。
Stud Health Technol Inform. 2023 May 18;302:28-32. doi: 10.3233/SHTI230058.
6
Utilization of anonymization techniques to create an external control arm for clinical trial data.利用匿名化技术为临床试验数据创建外部对照臂。
BMC Med Res Methodol. 2023 Nov 4;23(1):258. doi: 10.1186/s12874-023-02082-5.
7
Privacy-Preserving Anonymity for Periodical Releases of Spontaneous Adverse Drug Event Reporting Data: Algorithm Development and Validation.自发不良药物事件报告数据定期发布的隐私保护匿名性:算法开发与验证
JMIR Med Inform. 2021 Oct 28;9(10):e28752. doi: 10.2196/28752.
8
Experiments and Analyses of Anonymization Mechanisms for Trajectory Data Publishing.轨迹数据发布匿名化机制的实验与分析
J Comput Sci Technol. 2022;37(5):1026-1048. doi: 10.1007/s11390-022-2409-x. Epub 2022 Sep 30.
9
Designing a Novel Approach Using a Greedy and Information-Theoretic Clustering-Based Algorithm for Anonymizing Microdata Sets.设计一种基于贪心和信息论聚类算法的新颖方法,用于对微数据集进行匿名化处理。
Entropy (Basel). 2023 Dec 1;25(12):1613. doi: 10.3390/e25121613.
10
The cost of quality: Implementing generalization and suppression for anonymizing biomedical data with minimal information loss.质量成本:在信息损失最小化的情况下,对生物医学数据进行匿名化处理时实施泛化和抑制。
J Biomed Inform. 2015 Dec;58:37-48. doi: 10.1016/j.jbi.2015.09.007. Epub 2015 Sep 15.

引用本文的文献

1
The Costs of Anonymization: Case Study Using Clinical Data.匿名化的成本:使用临床数据的案例研究
J Med Internet Res. 2024 Apr 24;26:e49445. doi: 10.2196/49445.
2
Data Quality- and Utility-Compliant Anonymization of Common Data Model-Harmonized Electronic Health Record Data: Protocol for a Scoping Review.符合数据质量和效用要求的通用数据模型协调电子健康记录数据匿名化:范围审查方案
JMIR Res Protoc. 2023 Aug 11;12:e46471. doi: 10.2196/46471.

本文引用的文献

1
A novel bidirectional clustering algorithm based on local density.一种基于局部密度的新型双向聚类算法。
Sci Rep. 2021 Jul 9;11(1):14214. doi: 10.1038/s41598-021-93244-2.
2
A Comprehensive Survey on Local Differential Privacy toward Data Statistics and Analysis.面向数据统计与分析的局部差分隐私技术综述
Sensors (Basel). 2020 Dec 8;20(24):7030. doi: 10.3390/s20247030.
3
A Generic Method for Assessing the Quality of De-Identified Health Data.一种评估去标识化健康数据质量的通用方法。
Stud Health Technol Inform. 2016;228:312-6.
4
Statistical methods in cancer research. Volume II--The design and analysis of cohort studies.癌症研究中的统计方法。第二卷——队列研究的设计与分析。
IARC Sci Publ. 1987(82):1-406.