School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi, China.
PLoS One. 2021 Mar 23;16(3):e0248737. doi: 10.1371/journal.pone.0248737. eCollection 2021.
Fuzzy C-means clustering algorithm is one of the typical clustering algorithms in data mining applications. However, due to the sensitive information in the dataset, there is a risk of user privacy being leaked during the clustering process. The fuzzy C-means clustering of differential privacy protection can protect the user's individual privacy while mining data rules, however, the decline in availability caused by data disturbances is a common problem of these algorithms. Aiming at the problem that the algorithm accuracy is reduced by randomly initializing the membership matrix of fuzzy C-means, in this paper, the maximum distance method is firstly used to determine the initial center point. Then, the gaussian value of the cluster center point is used to calculate the privacy budget allocation ratio. Additionally, Laplace noise is added to complete differential privacy protection. The experimental results demonstrate that the clustering accuracy and effectiveness of the proposed algorithm are higher than baselines under the same privacy protection intensity.
模糊 C 均值聚类算法是数据挖掘应用中典型的聚类算法之一。然而,由于数据集包含敏感信息,在聚类过程中存在用户隐私泄露的风险。差分隐私保护的模糊 C 均值聚类可以在挖掘数据规则的同时保护用户的个人隐私,然而,数据干扰导致的可用性下降是这些算法的共同问题。针对模糊 C 均值算法中隶属度矩阵随机初始化导致算法精度降低的问题,本文首先采用最大距离法确定初始中心,然后利用聚类中心点的高斯值计算隐私预算分配比例,最后添加拉普拉斯噪声完成差分隐私保护。实验结果表明,在相同隐私保护强度下,所提算法的聚类精度和有效性均高于基线算法。