用于实现隐私保护回归的离题与值串联

Digression and Value Concatenation to Enable Privacy-Preserving Regression.

作者信息

Li Xiao-Bai, Sarkar Sumit

机构信息

Department of Operations and Information Systems, Manning School of Business, University of Massachusetts Lowell, Lowell, MA 01854 U.S.A. {

Naveen Jindal School of Management, University of Texas at Dallas, Richardson, TX 75080 U.S.A. {

出版信息

MIS Q. 2014 Sep;38(3):679-698. doi: 10.25300/misq/2014/38.3.03.

DOI:10.25300/misq/2014/38.3.03

PMID:26752802

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4703130/

Abstract

Regression techniques can be used not only for legitimate data analysis, but also to infer private information about individuals. In this paper, we demonstrate that regression trees, a popular data-analysis and data-mining technique, can be used to effectively reveal individuals' sensitive data. This problem, which we call a "regression attack," has not been addressed in the data privacy literature, and existing privacy-preserving techniques are not appropriate in coping with this problem. We propose a new approach to counter regression attacks. To protect against privacy disclosure, our approach introduces a novel measure, called , which assesses the sensitive value disclosure risk in the process of building a regression tree model. Specifically, we develop an algorithm that uses the measure for pruning the tree to limit disclosure of sensitive data. We also propose a dynamic value-concatenation method for anonymizing data, which better preserves data utility than a user-defined generalization scheme commonly used in existing approaches. Our approach can be used for anonymizing both numeric and categorical data. An experimental study is conducted using real-world financial, economic and healthcare data. The results of the experiments demonstrate that the proposed approach is very effective in protecting data privacy while preserving data quality for research and analysis.

摘要

回归技术不仅可用于合理的数据分析，还可用于推断有关个人的隐私信息。在本文中，我们证明了回归树（一种流行的数据分析和数据挖掘技术）可用于有效揭示个人的敏感数据。我们将这个问题称为“回归攻击”，数据隐私文献中尚未解决此问题，并且现有的隐私保护技术不适用于应对此问题。我们提出了一种应对回归攻击的新方法。为防止隐私泄露，我们的方法引入了一种名为的新度量，该度量在构建回归树模型的过程中评估敏感值泄露风险。具体而言，我们开发了一种算法，该算法使用该度量来修剪树以限制敏感数据的泄露。我们还提出了一种用于数据匿名化的动态值串联方法，与现有方法中常用的用户定义泛化方案相比，该方法能更好地保留数据效用。我们的方法可用于对数值型和类别型数据进行匿名化处理。使用真实世界的金融、经济和医疗数据进行了一项实验研究。实验结果表明，所提出的方法在保护数据隐私的同时，能有效地为研究和分析保留数据质量。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于实现隐私保护回归的离题与值串联

Digression and Value Concatenation to Enable Privacy-Preserving Regression.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

用于实现隐私保护回归的离题与值串联

Digression and Value Concatenation to Enable Privacy-Preserving Regression.

作者信息

机构信息

出版信息

相似文献

引用本文的文献