医疗保健数据及其他领域中分割联邦学习和多头分割联邦学习的性能与信息泄露

Performance and Information Leakage in Splitfed Learning and Multi-Head Split Learning in Healthcare Data and Beyond.

作者信息

Joshi Praveen, Thapa Chandra, Camtepe Seyit, Hasanuzzaman Mohammed, Scully Ted, Afli Haithem

机构信息

Department of Computer Sciences, Munster Technological University, MTU, T12 P928 Cork, Ireland.

CSIRO Data61, Marsfield, NSW 2122, Australia.

出版信息

Methods Protoc. 2022 Jul 13;5(4):60. doi: 10.3390/mps5040060.

DOI:10.3390/mps5040060

PMID:35893586

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9326525/

Abstract

Machine learning (ML) in healthcare data analytics is attracting much attention because of the unprecedented power of ML to extract knowledge that improves the decision-making process. At the same time, laws and ethics codes drafted by countries to govern healthcare data are becoming stringent. Although healthcare practitioners are struggling with an enforced governance framework, we see the emergence of distributed learning-based frameworks disrupting traditional-ML-model development. Splitfed learning (SFL) is one of the recent developments in distributed machine learning that empowers healthcare practitioners to preserve the privacy of input data and enables them to train ML models. However, SFL has some extra communication and computation overheads at the client side due to the requirement of client-side model synchronization. For a resource-constrained client side (hospitals with limited computational powers), removing such conditions is required to gain efficiency in the learning. In this regard, this paper studies SFL without client-side model synchronization. The resulting architecture is known as multi-head split learning (MHSL). At the same time, it is important to investigate information leakage, which indicates how much information is gained by the server related to the raw data directly out of the smashed data-the output of the client-side model portion-passed to it by the client. Our empirical studies examine the Resnet-18 and Conv1-D architecture model on the ECG and HAM-10000 datasets under IID data distribution. The results find that SFL provides 1.81% and 2.36% better accuracy than MHSL on the ECG and HAM-10000 datasets, respectively (for cut-layer value set to 1). Analysis of experimentation with various client-side model portions demonstrates that it has an impact on the overall performance. With an increase in layers in the client-side model portion, SFL performance improves while MHSL performance degrades. Experiment results also demonstrate that information leakage provided by mutual information score values in SFL is more than MHSL for ECG and HAM-10000 datasets by 2×10-5 and 4×10-3, respectively.

摘要

机器学习（ML）在医疗数据分析中备受关注，因为ML具有前所未有的强大能力来提取知识，从而改善决策过程。与此同时，各国起草的用于管理医疗数据的法律和道德准则正变得越来越严格。尽管医疗从业者正在努力应对强制实施的治理框架，但我们看到基于分布式学习的框架正在出现，扰乱了传统ML模型的开发。分割联邦学习（SFL）是分布式机器学习的最新发展之一，它使医疗从业者能够保护输入数据的隐私，并使他们能够训练ML模型。然而，由于客户端模型同步的要求，SFL在客户端有一些额外的通信和计算开销。对于资源受限的客户端（计算能力有限的医院），需要消除这些条件以提高学习效率。在这方面，本文研究了无客户端模型同步的SFL。由此产生的架构被称为多头分割学习（MHSL）。同时，研究信息泄露也很重要，信息泄露表明服务器直接从客户端传递给它的粉碎数据（客户端模型部分的输出）中获得了多少与原始数据相关的信息。我们的实证研究在独立同分布（IID）数据分布下，对心电图（ECG）和HAM - 10000数据集上的Resnet - 18和Conv1 - D架构模型进行了检验。结果发现，在心电图和HAM - 10000数据集上，SFL分别比MHSL的准确率高1.81%和2.36%（切割层值设置为1）。对各种客户端模型部分进行实验分析表明，它对整体性能有影响。随着客户端模型部分层数的增加，SFL性能提高而MHSL性能下降。实验结果还表明，对于心电图和HAM - 10000数据集，SFL中互信息得分值提供的信息泄露分别比MHSL多2×10 - 5和4×10 - 3。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dfd/9326525/09737f69f294/mps-05-00060-g001.jpg

相似文献

Performance and Information Leakage in Splitfed Learning and Multi-Head Split Learning in Healthcare Data and Beyond.医疗保健数据及其他领域中分割联邦学习和多头分割联邦学习的性能与信息泄露

Methods Protoc. 2022 Jul 13;5(4):60. doi: 10.3390/mps5040060.

Decentralised, collaborative, and privacy-preserving machine learning for multi-hospital data.去中心化、协作和保护隐私的机器学习，适用于多医院数据。

EBioMedicine. 2024 Mar;101:105006. doi: 10.1016/j.ebiom.2024.105006. Epub 2024 Feb 19.

Dynamic Corrected Split Federated Learning With Homomorphic Encryption for U-Shaped Medical Image Networks.基于同态加密的 U 型医学图像网络动态修正分割联邦学习。

IEEE J Biomed Health Inform. 2023 Dec;27(12):5946-5957. doi: 10.1109/JBHI.2023.3317632. Epub 2023 Dec 5.

Efficient Privacy-preserving Machine Learning in Hierarchical Distributed System.分层分布式系统中的高效隐私保护机器学习

IEEE Trans Netw Sci Eng. 2019 Oct-Dec;6(4):599-612. doi: 10.1109/tnse.2018.2859420. Epub 2018 Jul 24.

An EMD-Based Adaptive Client Selection Algorithm for Federated Learning in Heterogeneous Data Scenarios.一种基于经验模态分解的异构数据场景下联邦学习自适应客户端选择算法

Front Plant Sci. 2022 Jun 9;13:908814. doi: 10.3389/fpls.2022.908814. eCollection 2022.

dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning.dsMTL：用于隐私保护的分布式多任务机器学习的计算框架。

Bioinformatics. 2022 Oct 31;38(21):4919-4926. doi: 10.1093/bioinformatics/btac616.

Federated Learning on Clinical Benchmark Data: Performance Assessment.基于临床基准数据的联邦学习：性能评估。

J Med Internet Res. 2020 Oct 26;22(10):e20891. doi: 10.2196/20891.

Fair compute loads enabled by blockchain: sharing models by alternating client and server roles.通过区块链实现公平的计算负载：通过交替客户端和服务器角色来共享模型。

J Am Med Inform Assoc. 2019 May 1;26(5):392-403. doi: 10.1093/jamia/ocy180.

How to backdoor split learning.后门分裂学习。

Neural Netw. 2023 Nov;168:326-336. doi: 10.1016/j.neunet.2023.09.037. Epub 2023 Sep 24.

Edge Devices Clustering for Federated Visual Classification: A Feature Norm Based Framework.用于联邦视觉分类的边缘设备聚类：一种基于特征范数的框架。

IEEE Trans Image Process. 2023;32:995-1010. doi: 10.1109/TIP.2023.3237014. Epub 2023 Jan 30.

引用本文的文献

Enhancing Alzheimer's disease classification through split federated learning and GANs for imbalanced datasets.通过用于不平衡数据集的分割联邦学习和生成对抗网络增强阿尔茨海默病分类

PeerJ Comput Sci. 2024 Nov 29;10:e2459. doi: 10.7717/peerj-cs.2459. eCollection 2024.

Privacy-preserving decentralized learning methods for biomedical applications.用于生物医学应用的隐私保护分散式学习方法。

Comput Struct Biotechnol J. 2024 Aug 30;23:3281-3287. doi: 10.1016/j.csbj.2024.08.024. eCollection 2024 Dec.

本文引用的文献

Real-Time Patient-Specific ECG Classification by 1D Self-Operational Neural Networks.基于一维自运行神经网络的实时患者特异性心电图分类

IEEE Trans Biomed Eng. 2022 May;69(5):1788-1801. doi: 10.1109/TBME.2021.3135622. Epub 2022 Apr 21.

Federated learning: a collaborative effort to achieve better medical imaging models for individual sites that have small labelled datasets.联邦学习：为拥有少量标注数据集的各个站点构建更好的医学成像模型而进行的协作努力。

Quant Imaging Med Surg. 2021 Feb;11(2):852-857. doi: 10.21037/qims-20-595.

Cloud-Based Federated Learning Implementation Across Medical Centers.基于云的医疗中心间联邦学习的实现。

JCO Clin Cancer Inform. 2021 Jan;5:1-11. doi: 10.1200/CCI.20.00060.

The future of digital health with federated learning.联合学习助力数字健康的未来。

NPJ Digit Med. 2020 Sep 14;3:119. doi: 10.1038/s41746-020-00323-1. eCollection 2020.

The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions.HAM10000 数据集，一个大型的常见色素性皮肤病变多源皮肤镜图像集合。

Sci Data. 2018 Aug 14;5:180161. doi: 10.1038/sdata.2018.161.

The impact of the EU general data protection regulation on scientific research.欧盟通用数据保护条例对科学研究的影响。

Ecancermedicalscience. 2017 Jan 3;11:709. doi: 10.3332/ecancer.2017.709. eCollection 2017.

The impact of the MIT-BIH arrhythmia database.麻省理工学院-贝斯以色列女执事医疗中心心律失常数据库的影响。

IEEE Eng Med Biol Mag. 2001 May-Jun;20(3):45-50. doi: 10.1109/51.932724.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

医疗保健数据及其他领域中分割联邦学习和多头分割联邦学习的性能与信息泄露

Performance and Information Leakage in Splitfed Learning and Multi-Head Split Learning in Healthcare Data and Beyond.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献