School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, ShaanXi, China.
School of Science, Xi'an University of Technology, Xi'an, ShaanXi, China.
PLoS One. 2018 Nov 21;13(11):e0206832. doi: 10.1371/journal.pone.0206832. eCollection 2018.
This paper, based on differential privacy protecting K-means clustering algorithm, realizes privacy protection by adding data-disturbing Laplace noise to cluster center point. In order to solve the problem of Laplace noise randomness which causes the center point to deviate, especially when poor availability of clustering results appears because of small privacy budget parameters, an improved differential privacy protecting K-means clustering algorithm was raised in this paper. The improved algorithm uses the contour coefficients to quantitatively evaluate the clustering effect of each iteration and add different noise to different clusters. In order to be adapted to the huge number of data, this paper provides an algorithm design in MapReduce Framework. Experimental finding shows that the new algorithm improves the availability of the algorithm clustering results under the condition of ensuring individual privacy without significantly increasing its operating time.
本文基于差分隐私保护 K-means 聚类算法,通过向聚类中心点添加数据干扰拉普拉斯噪声来实现隐私保护。为了解决拉普拉斯噪声随机性导致中心点偏离的问题,特别是当由于隐私预算参数较小而导致聚类结果可用性较差时,本文提出了一种改进的差分隐私保护 K-means 聚类算法。改进的算法使用轮廓系数来定量评估每次迭代的聚类效果,并对不同的聚类添加不同的噪声。为了适应大量的数据,本文在 MapReduce 框架中提供了一种算法设计。实验结果表明,新算法在不显著增加运行时间的情况下,提高了算法聚类结果的可用性,同时确保了个体隐私。