Jia Qi, Guo Linke, Fang Yuguang, Wang Guirong
Department of Electrical and Computer Engineering, Binghamton University, Binghamton, NY, 13850.
Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL, 32611.
IEEE Trans Netw Sci Eng. 2019 Oct-Dec;6(4):599-612. doi: 10.1109/tnse.2018.2859420. Epub 2018 Jul 24.
With the dramatic growth of data in both amount and scale, distributed machine learning has become an important tool for the massive data to finish the tasks as prediction, classification, etc. However, due to the practical physical constraints and the potention privacy leakage of data, it is infeasible to aggregate raw data from all data owners for the learning purpose. To tackle this problem, the distributed privacy-preserving learning approaches are introduced to learn over all distributed data without exposing the real information. However, existing approaches have limits on the complicated distributed system. On the one hand, traditional privacy-preserving learning approaches rely on heavy cryptographic primitives on training data, in which the learning speed is dramatically slowed down due to the computation overheads. On the other hand, the complicated system architecture becomes a barrier in the practical distributed system. In this paper, we propose an efficient privacy-preserving machine learning scheme for hierarchical distributed systems. We modify and improve the collaborative learning algorithm. The proposed scheme not only reduces the overhead for the learning process but also provides the comprehensive protection for each layer of the hierarchical distributed system. In addition, based on the analysis of the collaborative convergency in different learning groups, we also propose an asynchronous strategy to further improve the learning efficiency of hierarchical distributed system. At the last, extensive experiments on real-world data are implemented to evaluate the privacy, efficacy, and efficiency of our proposed schemes.
随着数据在数量和规模上的急剧增长,分布式机器学习已成为处理海量数据以完成预测、分类等任务的重要工具。然而,由于实际的物理限制以及数据潜在的隐私泄露问题,为了学习目的而聚合所有数据所有者的原始数据是不可行的。为了解决这个问题,引入了分布式隐私保护学习方法,以便在所有分布式数据上进行学习而不暴露真实信息。然而,现有方法在复杂的分布式系统中存在局限性。一方面,传统的隐私保护学习方法在训练数据上依赖大量的密码原语,由于计算开销,学习速度会大幅减慢。另一方面,复杂的系统架构成为实际分布式系统中的一个障碍。在本文中,我们针对分层分布式系统提出了一种高效的隐私保护机器学习方案。我们修改并改进了协作学习算法。所提出的方案不仅减少了学习过程的开销,还为分层分布式系统的每一层提供了全面的保护。此外,基于对不同学习组中协作收敛性的分析,我们还提出了一种异步策略,以进一步提高分层分布式系统的学习效率。最后,对真实世界数据进行了广泛的实验,以评估我们提出的方案的隐私性、有效性和效率。