Suppr超能文献

用于实现隐私保护回归的离题与值串联

Digression and Value Concatenation to Enable Privacy-Preserving Regression.

作者信息

Li Xiao-Bai, Sarkar Sumit

机构信息

Department of Operations and Information Systems, Manning School of Business, University of Massachusetts Lowell, Lowell, MA 01854 U.S.A. {

Naveen Jindal School of Management, University of Texas at Dallas, Richardson, TX 75080 U.S.A. {

出版信息

MIS Q. 2014 Sep;38(3):679-698. doi: 10.25300/misq/2014/38.3.03.

Abstract

Regression techniques can be used not only for legitimate data analysis, but also to infer private information about individuals. In this paper, we demonstrate that regression trees, a popular data-analysis and data-mining technique, can be used to effectively reveal individuals' sensitive data. This problem, which we call a "regression attack," has not been addressed in the data privacy literature, and existing privacy-preserving techniques are not appropriate in coping with this problem. We propose a new approach to counter regression attacks. To protect against privacy disclosure, our approach introduces a novel measure, called , which assesses the sensitive value disclosure risk in the process of building a regression tree model. Specifically, we develop an algorithm that uses the measure for pruning the tree to limit disclosure of sensitive data. We also propose a dynamic value-concatenation method for anonymizing data, which better preserves data utility than a user-defined generalization scheme commonly used in existing approaches. Our approach can be used for anonymizing both numeric and categorical data. An experimental study is conducted using real-world financial, economic and healthcare data. The results of the experiments demonstrate that the proposed approach is very effective in protecting data privacy while preserving data quality for research and analysis.

摘要

回归技术不仅可用于合理的数据分析,还可用于推断有关个人的隐私信息。在本文中,我们证明了回归树(一种流行的数据分析和数据挖掘技术)可用于有效揭示个人的敏感数据。我们将这个问题称为“回归攻击”,数据隐私文献中尚未解决此问题,并且现有的隐私保护技术不适用于应对此问题。我们提出了一种应对回归攻击的新方法。为防止隐私泄露,我们的方法引入了一种名为 的新度量,该度量在构建回归树模型的过程中评估敏感值泄露风险。具体而言,我们开发了一种算法,该算法使用该度量来修剪树以限制敏感数据的泄露。我们还提出了一种用于数据匿名化的动态值串联方法,与现有方法中常用的用户定义泛化方案相比,该方法能更好地保留数据效用。我们的方法可用于对数值型和类别型数据进行匿名化处理。使用真实世界的金融、经济和医疗数据进行了一项实验研究。实验结果表明,所提出的方法在保护数据隐私的同时,能有效地为研究和分析保留数据质量。

相似文献

3
Anonymizing 1:M microdata with high utility.以高实用性对1:M微数据进行匿名化处理。
Knowl Based Syst. 2017 Jan 1;115:15-26. doi: 10.1016/j.knosys.2016.10.012. Epub 2016 Oct 21.
6
Utility-preserving anonymization for health data publishing.用于健康数据发布的效用保持匿名化
BMC Med Inform Decis Mak. 2017 Jul 11;17(1):104. doi: 10.1186/s12911-017-0499-0.
9
Anonymizing and Sharing Medical Text Records.匿名化与共享医学文本记录
Inf Syst Res. 2017;28(2):332-352. doi: 10.1287/isre.2016.0676. Epub 2017 Apr 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验