Eyupoglu Can, Aydin Muhammed Ali, Zaim Abdul Halim, Sertbas Ahmet
Department of Computer Engineering, Istanbul Commerce University, Istanbul 34840, Turkey.
Department of Computer Engineering, Istanbul University, Istanbul 34320, Turkey.
Entropy (Basel). 2018 May 17;20(5):373. doi: 10.3390/e20050373.
The topic of big data has attracted increasing interest in recent years. The emergence of big data leads to new difficulties in terms of protection models used for data privacy, which is of necessity for sharing and processing data. Protecting individuals' sensitive information while maintaining the usability of the data set published is the most important challenge in privacy preserving. In this regard, data anonymization methods are utilized in order to protect data against identity disclosure and linking attacks. In this study, a novel data anonymization algorithm based on chaos and perturbation has been proposed for privacy and utility preserving in big data. The performance of the proposed algorithm is evaluated in terms of Kullback-Leibler divergence, probabilistic anonymity, classification accuracy, F-measure and execution time. The experimental results have shown that the proposed algorithm is efficient and performs better in terms of Kullback-Leibler divergence, classification accuracy and F-measure compared to most of the existing algorithms using the same data set. Resulting from applying chaos to perturb data, such successful algorithm is promising to be used in privacy preserving data mining and data publishing.
近年来,大数据主题已引起越来越多的关注。大数据的出现给用于数据隐私保护的模型带来了新的难题,而数据隐私保护对于数据共享和处理来说是必不可少的。在隐私保护中,最重要的挑战是在保持所发布数据集可用性的同时保护个人敏感信息。在这方面,数据匿名化方法被用于保护数据免遭身份泄露和链接攻击。在本研究中,提出了一种基于混沌和扰动的新型数据匿名化算法,用于大数据中的隐私保护和实用性维护。从库尔贝克-莱布勒散度、概率匿名性、分类准确率、F值和执行时间等方面对所提算法的性能进行了评估。实验结果表明,与使用相同数据集的大多数现有算法相比,所提算法是高效的,并且在库尔贝克-莱布勒散度、分类准确率和F值方面表现更好。由于应用混沌来扰动数据,这种成功的算法有望用于隐私保护数据挖掘和数据发布。