Feretzakis Georgios, Kalles Dimitris, Verykios Vassilios S
School of Science and Technology, Hellenic Open University, Patras 263 35, Greece.
Entropy (Basel). 2019 Jan 14;21(1):66. doi: 10.3390/e21010066.
Data sharing among organizations has become an increasingly common procedure in several areas such as advertising, marketing, electronic commerce, banking, and insurance sectors. However, any organization will most likely try to keep some patterns as hidden as possible once it shares its datasets with others. This paper focuses on preserving the privacy of sensitive patterns when inducing decision trees. We adopt a record augmentation approach to hide critical classification rules in binary datasets. Such a hiding methodology is preferred over other heuristic solutions like output perturbation or cryptographic techniques, which limit the usability of the data, since the raw data itself is readily available for public use. We propose a look ahead technique using linear Diophantine equations to add the appropriate number of instances while maintaining the initial entropy of the nodes. This method can be used to hide one or more decision tree rules optimally.
组织间的数据共享在广告、营销、电子商务、银行和保险等多个领域已成为越来越普遍的做法。然而,任何组织在与其他方共享其数据集后,很可能会尽量将某些模式隐藏起来。本文重点关注在归纳决策树时保护敏感模式的隐私。我们采用记录扩充方法来隐藏二进制数据集中的关键分类规则。与输出扰动或加密技术等其他启发式解决方案相比,这种隐藏方法更受青睐,因为后两者会限制数据的可用性,而原始数据本身可随时供公众使用。我们提出一种使用线性丢番图方程的前瞻技术,在保持节点初始熵的同时添加适当数量的实例。该方法可用于最佳地隐藏一个或多个决策树规则。