Suppr超能文献

医疗保健数据及其他领域中分割联邦学习和多头分割联邦学习的性能与信息泄露

Performance and Information Leakage in Splitfed Learning and Multi-Head Split Learning in Healthcare Data and Beyond.

作者信息

Joshi Praveen, Thapa Chandra, Camtepe Seyit, Hasanuzzaman Mohammed, Scully Ted, Afli Haithem

机构信息

Department of Computer Sciences, Munster Technological University, MTU, T12 P928 Cork, Ireland.

CSIRO Data61, Marsfield, NSW 2122, Australia.

出版信息

Methods Protoc. 2022 Jul 13;5(4):60. doi: 10.3390/mps5040060.

Abstract

Machine learning (ML) in healthcare data analytics is attracting much attention because of the unprecedented power of ML to extract knowledge that improves the decision-making process. At the same time, laws and ethics codes drafted by countries to govern healthcare data are becoming stringent. Although healthcare practitioners are struggling with an enforced governance framework, we see the emergence of distributed learning-based frameworks disrupting traditional-ML-model development. Splitfed learning (SFL) is one of the recent developments in distributed machine learning that empowers healthcare practitioners to preserve the privacy of input data and enables them to train ML models. However, SFL has some extra communication and computation overheads at the client side due to the requirement of client-side model synchronization. For a resource-constrained client side (hospitals with limited computational powers), removing such conditions is required to gain efficiency in the learning. In this regard, this paper studies SFL without client-side model synchronization. The resulting architecture is known as multi-head split learning (MHSL). At the same time, it is important to investigate information leakage, which indicates how much information is gained by the server related to the raw data directly out of the smashed data-the output of the client-side model portion-passed to it by the client. Our empirical studies examine the Resnet-18 and Conv1-D architecture model on the ECG and HAM-10000 datasets under IID data distribution. The results find that SFL provides 1.81% and 2.36% better accuracy than MHSL on the ECG and HAM-10000 datasets, respectively (for cut-layer value set to 1). Analysis of experimentation with various client-side model portions demonstrates that it has an impact on the overall performance. With an increase in layers in the client-side model portion, SFL performance improves while MHSL performance degrades. Experiment results also demonstrate that information leakage provided by mutual information score values in SFL is more than MHSL for ECG and HAM-10000 datasets by 2×10-5 and 4×10-3, respectively.

摘要

机器学习(ML)在医疗数据分析中备受关注,因为ML具有前所未有的强大能力来提取知识,从而改善决策过程。与此同时,各国起草的用于管理医疗数据的法律和道德准则正变得越来越严格。尽管医疗从业者正在努力应对强制实施的治理框架,但我们看到基于分布式学习的框架正在出现,扰乱了传统ML模型的开发。分割联邦学习(SFL)是分布式机器学习的最新发展之一,它使医疗从业者能够保护输入数据的隐私,并使他们能够训练ML模型。然而,由于客户端模型同步的要求,SFL在客户端有一些额外的通信和计算开销。对于资源受限的客户端(计算能力有限的医院),需要消除这些条件以提高学习效率。在这方面,本文研究了无客户端模型同步的SFL。由此产生的架构被称为多头分割学习(MHSL)。同时,研究信息泄露也很重要,信息泄露表明服务器直接从客户端传递给它的粉碎数据(客户端模型部分的输出)中获得了多少与原始数据相关的信息。我们的实证研究在独立同分布(IID)数据分布下,对心电图(ECG)和HAM - 10000数据集上的Resnet - 18和Conv1 - D架构模型进行了检验。结果发现,在心电图和HAM - 10000数据集上,SFL分别比MHSL的准确率高1.81%和2.36%(切割层值设置为1)。对各种客户端模型部分进行实验分析表明,它对整体性能有影响。随着客户端模型部分层数的增加,SFL性能提高而MHSL性能下降。实验结果还表明,对于心电图和HAM - 10000数据集,SFL中互信息得分值提供的信息泄露分别比MHSL多2×10 - 5和4×10 - 3。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3dfd/9326525/09737f69f294/mps-05-00060-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验