分层分布式系统中的高效隐私保护机器学习

Efficient Privacy-preserving Machine Learning in Hierarchical Distributed System.

作者信息

Jia Qi, Guo Linke, Fang Yuguang, Wang Guirong

机构信息

Department of Electrical and Computer Engineering, Binghamton University, Binghamton, NY, 13850.

Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL, 32611.

出版信息

IEEE Trans Netw Sci Eng. 2019 Oct-Dec;6(4):599-612. doi: 10.1109/tnse.2018.2859420. Epub 2018 Jul 24.

DOI:10.1109/tnse.2018.2859420

PMID:33748314

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7970733/

Abstract

With the dramatic growth of data in both amount and scale, distributed machine learning has become an important tool for the massive data to finish the tasks as prediction, classification, etc. However, due to the practical physical constraints and the potention privacy leakage of data, it is infeasible to aggregate raw data from all data owners for the learning purpose. To tackle this problem, the distributed privacy-preserving learning approaches are introduced to learn over all distributed data without exposing the real information. However, existing approaches have limits on the complicated distributed system. On the one hand, traditional privacy-preserving learning approaches rely on heavy cryptographic primitives on training data, in which the learning speed is dramatically slowed down due to the computation overheads. On the other hand, the complicated system architecture becomes a barrier in the practical distributed system. In this paper, we propose an efficient privacy-preserving machine learning scheme for hierarchical distributed systems. We modify and improve the collaborative learning algorithm. The proposed scheme not only reduces the overhead for the learning process but also provides the comprehensive protection for each layer of the hierarchical distributed system. In addition, based on the analysis of the collaborative convergency in different learning groups, we also propose an asynchronous strategy to further improve the learning efficiency of hierarchical distributed system. At the last, extensive experiments on real-world data are implemented to evaluate the privacy, efficacy, and efficiency of our proposed schemes.

摘要

随着数据在数量和规模上的急剧增长，分布式机器学习已成为处理海量数据以完成预测、分类等任务的重要工具。然而，由于实际的物理限制以及数据潜在的隐私泄露问题，为了学习目的而聚合所有数据所有者的原始数据是不可行的。为了解决这个问题，引入了分布式隐私保护学习方法，以便在所有分布式数据上进行学习而不暴露真实信息。然而，现有方法在复杂的分布式系统中存在局限性。一方面，传统的隐私保护学习方法在训练数据上依赖大量的密码原语，由于计算开销，学习速度会大幅减慢。另一方面，复杂的系统架构成为实际分布式系统中的一个障碍。在本文中，我们针对分层分布式系统提出了一种高效的隐私保护机器学习方案。我们修改并改进了协作学习算法。所提出的方案不仅减少了学习过程的开销，还为分层分布式系统的每一层提供了全面的保护。此外，基于对不同学习组中协作收敛性的分析，我们还提出了一种异步策略，以进一步提高分层分布式系统的学习效率。最后，对真实世界数据进行了广泛的实验，以评估我们提出的方案的隐私性、有效性和效率。

相似文献

Efficient Privacy-preserving Machine Learning in Hierarchical Distributed System.分层分布式系统中的高效隐私保护机器学习

IEEE Trans Netw Sci Eng. 2019 Oct-Dec;6(4):599-612. doi: 10.1109/tnse.2018.2859420. Epub 2018 Jul 24.

High performance logistic regression for privacy-preserving genome analysis.用于隐私保护基因组分析的高性能逻辑回归。

BMC Med Genomics. 2021 Jan 20;14(1):23. doi: 10.1186/s12920-020-00869-9.

Practical and Robust Federated Learning With Highly Scalable Regression Training.具有高度可扩展回归训练的实用且稳健的联邦学习

IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):13801-13815. doi: 10.1109/TNNLS.2023.3271859. Epub 2024 Oct 7.

Guaranteed distributed machine learning: Privacy-preserving empirical risk minimization.有保证的分布式机器学习：隐私保护经验风险最小化。

Math Biosci Eng. 2021 Jun 1;18(4):4772-4796. doi: 10.3934/mbe.2021243.

Privacy-preserving federated machine learning on FAIR health data: A real-world application.公平健康数据上的隐私保护联邦机器学习：一个实际应用

Comput Struct Biotechnol J. 2024 Feb 17;24:136-145. doi: 10.1016/j.csbj.2024.02.014. eCollection 2024 Dec.

Insuring against the perils in distributed learning: privacy-preserving empirical risk minimization.防范分布式学习中的风险：隐私保护经验风险最小化。

Math Biosci Eng. 2021 Mar 29;18(4):3006-3033. doi: 10.3934/mbe.2021151.

A collaborative framework for Distributed Privacy-Preserving Support Vector Machine learning.一种用于分布式隐私保护支持向量机学习的协作框架。

AMIA Annu Symp Proc. 2012;2012:1350-9. Epub 2012 Nov 3.

Decentralised, collaborative, and privacy-preserving machine learning for multi-hospital data.去中心化、协作和保护隐私的机器学习，适用于多医院数据。

EBioMedicine. 2024 Mar;101:105006. doi: 10.1016/j.ebiom.2024.105006. Epub 2024 Feb 19.

Learning from vertically distributed data across multiple sites: An efficient privacy-preserving algorithm for Cox proportional hazards model with variable selection.从多个站点的垂直分布数据中学习：一种用于具有变量选择的Cox比例风险模型的高效隐私保护算法。

J Biomed Inform. 2024 Jan;149:104581. doi: 10.1016/j.jbi.2023.104581. Epub 2023 Dec 23.

Privacy-preserving model learning on a blockchain network-of-networks.在区块链网络的网络上进行隐私保护模型学习。

J Am Med Inform Assoc. 2020 Mar 1;27(3):343-354. doi: 10.1093/jamia/ocz214.

本文引用的文献

Intelligent Perioperative System: Towards Real-time Big Data Analytics in Surgery Risk Assessment.智能围手术期系统：迈向手术风险评估中的实时大数据分析

DASC PICom DataCom CyberSciTech 2017 (2017). 2017 Nov;2017:1254-1259. doi: 10.1109/DASC-PICom-DataCom-CyberSciTec.2017.201.

Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance.结合搜索、社交媒体和传统数据源以改善流感监测。

PLoS Comput Biol. 2015 Oct 29;11(10):e1004513. doi: 10.1371/journal.pcbi.1004513. eCollection 2015 Oct.

More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server.通过陈旧同步并行参数服务器实现更高效的分布式机器学习

Adv Neural Inf Process Syst. 2013;2013:1223-1231.

Searching for exotic particles in high-energy physics with deep learning.用深度学习在高能物理学中寻找奇异粒子。

Nat Commun. 2014 Jul 2;5:4308. doi: 10.1038/ncomms5308.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验