Chaudhuri Kamalika, Monteleoni Claire, Sarwate Anand D
Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA.
J Mach Learn Res. 2011 Mar;12:1069-1109.
Privacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical risk minimization (ERM). These algorithms are private under the ε-differential privacy definition due to Dwork et al. (2006). First we apply the output perturbation ideas of Dwork et al. (2006), to ERM classification. Then we propose a new method, objective perturbation, for privacy-preserving machine learning algorithm design. This method entails perturbing the objective function before optimizing over classifiers. If the loss and regularizer satisfy certain convexity and differentiability criteria, we prove theoretical results showing that our algorithms preserve privacy, and provide generalization bounds for linear and nonlinear kernels. We further present a privacy-preserving technique for tuning the parameters in general machine learning algorithms, thereby providing end-to-end privacy guarantees for the training process. We apply these results to produce privacy-preserving analogues of regularized logistic regression and support vector machines. We obtain encouraging results from evaluating their performance on real demographic and benchmark data sets. Our results show that both theoretically and empirically, objective perturbation is superior to the previous state-of-the-art, output perturbation, in managing the inherent tradeoff between privacy and learning performance.
对于分析诸如医疗或金融记录等个人数据这种日益常见的情况而言,隐私保护机器学习算法至关重要。我们提供了一些通用技术,用于生成通过(正则化)经验风险最小化(ERM)学习得到的分类器的隐私保护近似值。由于德沃克等人(2006年)提出的ε-差分隐私定义,这些算法具有隐私性。首先,我们将德沃克等人(2006年)的输出扰动思想应用于ERM分类。然后,我们提出了一种用于隐私保护机器学习算法设计的新方法——目标扰动。该方法需要在对分类器进行优化之前对目标函数进行扰动。如果损失函数和正则化项满足某些凸性和可微性标准,我们证明了理论结果,表明我们的算法能够保护隐私,并为线性和非线性核提供泛化界。我们还提出了一种在一般机器学习算法中调整参数的隐私保护技术,从而为训练过程提供端到端的隐私保证。我们应用这些结果来生成正则化逻辑回归和支持向量机的隐私保护类似物。我们通过在真实人口统计和基准数据集上评估它们的性能,得到了令人鼓舞的结果。我们的结果表明,在理论和实证方面,目标扰动在处理隐私和学习性能之间的内在权衡时都优于先前的最优方法——输出扰动。