Kumar G Sathish, Premalatha K
Department of Computer Science and Engineering, Sri Krishna College of Engineering and Technology, Coimbatore, Tamil Nadu India.
Department of Computer Science and Engineering, Bannari Amman Institute of Technology, Erode, Tamil Nadu India.
Distrib Parallel Databases. 2023 Apr 21:1-34. doi: 10.1007/s10619-023-07423-3.
Data sharing to the multiple organizations are essential for analysis in many situations. The shared data contains the individual's private and sensitive information and results in privacy breach. To overcome the privacy challenges, privacy preserving data mining (PPDM) has progressed as a solution. This work addresses the problem of PPDM by proposing statistical transformation with intuitionistic fuzzy (STIF) algorithm for data perturbation. The STIF algorithm contains statistical methods weight of evidence, information value and intuitionistic fuzzy Gaussian membership function. The STIF algorithm is applied on three benchmark datasets adult income, bank marketing and lung cancer. The classifier models decision tree, random forest, extreme gradient boost and support vector machines are used for accuracy and performance analysis. The results show that the STIF algorithm achieves 99% of accuracy for adult income dataset and 100% accuracy for both bank marketing and lung cancer datasets. Further, the results highlights that the STIF algorithm outperforms in data perturbation capacity and privacy preserving capacity than the state-of-art algorithms without any information loss on both numerical and categorical data.
在许多情况下,将数据共享给多个组织对于分析而言至关重要。共享的数据包含个人的私密和敏感信息,会导致隐私泄露。为了克服隐私挑战,隐私保护数据挖掘(PPDM)作为一种解决方案得到了发展。这项工作通过提出用于数据扰动的直觉模糊统计变换(STIF)算法来解决PPDM问题。STIF算法包含证据权重、信息值等统计方法以及直觉模糊高斯隶属函数。STIF算法应用于成人收入、银行营销和肺癌这三个基准数据集。使用决策树、随机森林、极端梯度提升和支持向量机等分类器模型进行准确性和性能分析。结果表明,STIF算法在成人收入数据集上实现了99%的准确率,在银行营销和肺癌数据集上均实现了100%的准确率。此外,结果突出显示,与现有算法相比,STIF算法在数据扰动能力和隐私保护能力方面表现更优,在数值和分类数据上均无任何信息损失。