Joshi Praveen, Thapa Chandra, Camtepe Seyit, Hasanuzzaman Mohammed, Scully Ted, Afli Haithem
Department of Computer Sciences, Munster Technological University, MTU, T12 P928 Cork, Ireland.
CSIRO Data61, Marsfield, NSW 2122, Australia.
Methods Protoc. 2022 Jul 13;5(4):60. doi: 10.3390/mps5040060.
Machine learning (ML) in healthcare data analytics is attracting much attention because of the unprecedented power of ML to extract knowledge that improves the decision-making process. At the same time, laws and ethics codes drafted by countries to govern healthcare data are becoming stringent. Although healthcare practitioners are struggling with an enforced governance framework, we see the emergence of distributed learning-based frameworks disrupting traditional-ML-model development. Splitfed learning (SFL) is one of the recent developments in distributed machine learning that empowers healthcare practitioners to preserve the privacy of input data and enables them to train ML models. However, SFL has some extra communication and computation overheads at the client side due to the requirement of client-side model synchronization. For a resource-constrained client side (hospitals with limited computational powers), removing such conditions is required to gain efficiency in the learning. In this regard, this paper studies SFL without client-side model synchronization. The resulting architecture is known as multi-head split learning (MHSL). At the same time, it is important to investigate information leakage, which indicates how much information is gained by the server related to the raw data directly out of the smashed data-the output of the client-side model portion-passed to it by the client. Our empirical studies examine the Resnet-18 and Conv1-D architecture model on the ECG and HAM-10000 datasets under IID data distribution. The results find that SFL provides 1.81% and 2.36% better accuracy than MHSL on the ECG and HAM-10000 datasets, respectively (for cut-layer value set to 1). Analysis of experimentation with various client-side model portions demonstrates that it has an impact on the overall performance. With an increase in layers in the client-side model portion, SFL performance improves while MHSL performance degrades. Experiment results also demonstrate that information leakage provided by mutual information score values in SFL is more than MHSL for ECG and HAM-10000 datasets by 2×10-5 and 4×10-3, respectively.
机器学习(ML)在医疗数据分析中备受关注,因为ML具有前所未有的强大能力来提取知识,从而改善决策过程。与此同时,各国起草的用于管理医疗数据的法律和道德准则正变得越来越严格。尽管医疗从业者正在努力应对强制实施的治理框架,但我们看到基于分布式学习的框架正在出现,扰乱了传统ML模型的开发。分割联邦学习(SFL)是分布式机器学习的最新发展之一,它使医疗从业者能够保护输入数据的隐私,并使他们能够训练ML模型。然而,由于客户端模型同步的要求,SFL在客户端有一些额外的通信和计算开销。对于资源受限的客户端(计算能力有限的医院),需要消除这些条件以提高学习效率。在这方面,本文研究了无客户端模型同步的SFL。由此产生的架构被称为多头分割学习(MHSL)。同时,研究信息泄露也很重要,信息泄露表明服务器直接从客户端传递给它的粉碎数据(客户端模型部分的输出)中获得了多少与原始数据相关的信息。我们的实证研究在独立同分布(IID)数据分布下,对心电图(ECG)和HAM - 10000数据集上的Resnet - 18和Conv1 - D架构模型进行了检验。结果发现,在心电图和HAM - 10000数据集上,SFL分别比MHSL的准确率高1.81%和2.36%(切割层值设置为1)。对各种客户端模型部分进行实验分析表明,它对整体性能有影响。随着客户端模型部分层数的增加,SFL性能提高而MHSL性能下降。实验结果还表明,对于心电图和HAM - 10000数据集,SFL中互信息得分值提供的信息泄露分别比MHSL多2×10 - 5和4×10 - 3。