• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

差分隐私经验风险最小化

Differentially Private Empirical Risk Minimization.

作者信息

Chaudhuri Kamalika, Monteleoni Claire, Sarwate Anand D

机构信息

Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA.

出版信息

J Mach Learn Res. 2011 Mar;12:1069-1109.

PMID:21892342
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3164588/
Abstract

Privacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical risk minimization (ERM). These algorithms are private under the ε-differential privacy definition due to Dwork et al. (2006). First we apply the output perturbation ideas of Dwork et al. (2006), to ERM classification. Then we propose a new method, objective perturbation, for privacy-preserving machine learning algorithm design. This method entails perturbing the objective function before optimizing over classifiers. If the loss and regularizer satisfy certain convexity and differentiability criteria, we prove theoretical results showing that our algorithms preserve privacy, and provide generalization bounds for linear and nonlinear kernels. We further present a privacy-preserving technique for tuning the parameters in general machine learning algorithms, thereby providing end-to-end privacy guarantees for the training process. We apply these results to produce privacy-preserving analogues of regularized logistic regression and support vector machines. We obtain encouraging results from evaluating their performance on real demographic and benchmark data sets. Our results show that both theoretically and empirically, objective perturbation is superior to the previous state-of-the-art, output perturbation, in managing the inherent tradeoff between privacy and learning performance.

摘要

对于分析诸如医疗或金融记录等个人数据这种日益常见的情况而言,隐私保护机器学习算法至关重要。我们提供了一些通用技术,用于生成通过(正则化)经验风险最小化(ERM)学习得到的分类器的隐私保护近似值。由于德沃克等人(2006年)提出的ε-差分隐私定义,这些算法具有隐私性。首先,我们将德沃克等人(2006年)的输出扰动思想应用于ERM分类。然后,我们提出了一种用于隐私保护机器学习算法设计的新方法——目标扰动。该方法需要在对分类器进行优化之前对目标函数进行扰动。如果损失函数和正则化项满足某些凸性和可微性标准,我们证明了理论结果,表明我们的算法能够保护隐私,并为线性和非线性核提供泛化界。我们还提出了一种在一般机器学习算法中调整参数的隐私保护技术,从而为训练过程提供端到端的隐私保证。我们应用这些结果来生成正则化逻辑回归和支持向量机的隐私保护类似物。我们通过在真实人口统计和基准数据集上评估它们的性能,得到了令人鼓舞的结果。我们的结果表明,在理论和实证方面,目标扰动在处理隐私和学习性能之间的内在权衡时都优于先前的最优方法——输出扰动。

相似文献

1
Differentially Private Empirical Risk Minimization.差分隐私经验风险最小化
J Mach Learn Res. 2011 Mar;12:1069-1109.
2
Privacy-Preserving Cost-Sensitive Learning.隐私保护成本敏感学习
IEEE Trans Neural Netw Learn Syst. 2021 May;32(5):2105-2116. doi: 10.1109/TNNLS.2020.2996972. Epub 2021 May 4.
3
Guaranteed distributed machine learning: Privacy-preserving empirical risk minimization.有保证的分布式机器学习:隐私保护经验风险最小化。
Math Biosci Eng. 2021 Jun 1;18(4):4772-4796. doi: 10.3934/mbe.2021243.
4
Insuring against the perils in distributed learning: privacy-preserving empirical risk minimization.防范分布式学习中的风险:隐私保护经验风险最小化。
Math Biosci Eng. 2021 Mar 29;18(4):3006-3033. doi: 10.3934/mbe.2021151.
5
DPWSS: differentially private working set selection for training support vector machines.DPWSS:用于训练支持向量机的差分隐私工作集选择
PeerJ Comput Sci. 2021 Dec 1;7:e799. doi: 10.7717/peerj-cs.799. eCollection 2021.
6
Data Obfuscation Through Latent Space Projection for Privacy-Preserving AI Governance: Case Studies in Medical Diagnosis and Finance Fraud Detection.通过潜在空间投影进行数据混淆以实现隐私保护的人工智能治理:医学诊断和金融欺诈检测案例研究
JMIRx Med. 2025 Mar 12;6:e70100. doi: 10.2196/70100.
7
Differentially private multivariate time series forecasting of aggregated human mobility with deep learning: Input or gradient perturbation?基于深度学习的聚合人类移动性的差分隐私多变量时间序列预测:输入扰动还是梯度扰动?
Neural Comput Appl. 2022;34(16):13355-13369. doi: 10.1007/s00521-022-07393-0. Epub 2022 Jun 3.
8
Efficient differentially private learning improves drug sensitivity prediction.高效差分隐私学习提高药物敏感性预测。
Biol Direct. 2018 Feb 6;13(1):1. doi: 10.1186/s13062-017-0203-4.
9
Quantum machine learning with differential privacy.带差分隐私的量子机器学习。
Sci Rep. 2023 Feb 11;13(1):2453. doi: 10.1038/s41598-022-24082-z.
10
Sample Complexity Bounds for Differentially Private Learning.差分隐私学习的样本复杂度界
JMLR Workshop Conf Proc. 2011;2011:155-186.

引用本文的文献

1
A Neural Approach to Spatio-Temporal Data Release with User-Level Differential Privacy.一种具有用户级差分隐私的时空数据发布的神经方法。
Proc ACM Manag Data. 2023 May;1(1). doi: 10.1145/3588701. Epub 2023 May 30.
2
Differential Privacy for Classifier Evaluation.用于分类器评估的差分隐私
AISec. 2015;2015:15-23. doi: 10.1145/2808769.2808775. Epub 2015 Oct 16.
3
Privacy for free in the overparameterized regime.在过参数化情况下免费实现隐私保护。
Proc Natl Acad Sci U S A. 2025 Apr 15;122(15):e2423072122. doi: 10.1073/pnas.2423072122. Epub 2025 Apr 11.
4
Data Obfuscation Through Latent Space Projection for Privacy-Preserving AI Governance: Case Studies in Medical Diagnosis and Finance Fraud Detection.通过潜在空间投影进行数据混淆以实现隐私保护的人工智能治理:医学诊断和金融欺诈检测案例研究
JMIRx Med. 2025 Mar 12;6:e70100. doi: 10.2196/70100.
5
Privacy-Enhancing Technologies in Biomedical Data Science.生物医学数据科学中的隐私增强技术。
Annu Rev Biomed Data Sci. 2024 Aug;7(1):317-343. doi: 10.1146/annurev-biodatasci-120423-120107.
6
Does Differentially Private Synthetic Data Lead to Synthetic Discoveries?差分隐私合成数据是否会导致合成发现?
Methods Inf Med. 2024 May;63(1-02):35-51. doi: 10.1055/a-2385-1355. Epub 2024 Aug 13.
7
Self-learning activation functions to increase accuracy of privacy-preserving Convolutional Neural Networks with homomorphic encryption.自学习激活函数提高同态加密隐私保护卷积神经网络的准确性。
PLoS One. 2024 Jul 22;19(7):e0306420. doi: 10.1371/journal.pone.0306420. eCollection 2024.
8
Anonymization: The imperfect science of using data while preserving privacy.匿名化:在保护隐私的同时使用数据的不完美科学。
Sci Adv. 2024 Jul 19;10(29):eadn7053. doi: 10.1126/sciadv.adn7053. Epub 2024 Jul 17.
9
Correlation inference attacks against machine learning models.针对机器学习模型的相关性推理攻击。
Sci Adv. 2024 Jul 12;10(28):eadj9260. doi: 10.1126/sciadv.adj9260. Epub 2024 Jul 10.
10
A data-driven approach to choosing privacy parameters for clinical trial data sharing under differential privacy.一种数据驱动的方法,用于在差分隐私下为临床试验数据共享选择隐私参数。
J Am Med Inform Assoc. 2024 Apr 19;31(5):1135-1143. doi: 10.1093/jamia/ocae038.

本文引用的文献

1
Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays.使用高密度单核苷酸多态性(SNP)基因分型微阵列解析对高度复杂混合物贡献微量DNA的个体。
PLoS Genet. 2008 Aug 29;4(8):e1000167. doi: 10.1371/journal.pgen.1000167.
2
Training a support vector machine in the primal.在原始问题中训练支持向量机。
Neural Comput. 2007 May;19(5):1155-78. doi: 10.1162/neco.2007.19.5.1155.
3
Weaving technology and policy together to maintain confidentiality.将技术与政策相结合以维护保密性。
J Law Med Ethics. 1997 Summer-Fall;25(2-3):98-110, 82. doi: 10.1111/j.1748-720x.1997.tb01885.x.