Mitrovska Angela, Safari Pooyan, Ritter Kerstin, Shariati Behnam, Fischer Johannes Karl
Fraunhofer-Institut fur Nachrichtentechnik, Heinrich-Hertz-Institute (HHI), Berlin, Germany.
Bernstein Center for Computational Neuroscience, Berlin, Germany.
Front Aging Neurosci. 2024 Mar 7;16:1324032. doi: 10.3389/fnagi.2024.1324032. eCollection 2024.
Machine Learning (ML) is considered a promising tool to aid and accelerate diagnosis in various medical areas, including neuroimaging. However, its success is set back by the lack of large-scale public datasets. Indeed, medical institutions possess a large amount of data; however, open-sourcing is prevented by the legal requirements to protect the patient's privacy. Federated Learning (FL) is a viable alternative that can overcome this issue. This work proposes training an ML model for Alzheimer's Disease (AD) detection based on structural MRI (sMRI) data in a federated setting. We implement two aggregation algorithms, Federated Averaging (FedAvg) and Secure Aggregation (SecAgg), and compare their performance with the centralized ML model training. We simulate heterogeneous environments and explore the impact of demographical (sex, age, and diagnosis) and imbalanced data distributions. The simulated heterogeneous environments allow us to observe these statistical differences' effect on the ML models trained using FL and highlight the importance of studying such differences when training ML models for AD detection. Moreover, as part of the evaluation, we demonstrate the increased privacy guarantees of FL with SecAgg via simulated membership inference attacks.
机器学习(ML)被认为是一种有前途的工具,可辅助并加速包括神经成像在内的各个医学领域的诊断。然而,其成功因缺乏大规模公共数据集而受阻。实际上,医疗机构拥有大量数据;然而,出于保护患者隐私的法律要求,数据无法开源。联邦学习(FL)是一种可以克服此问题的可行替代方案。这项工作提出在联邦环境中基于结构磁共振成像(sMRI)数据训练用于阿尔茨海默病(AD)检测的ML模型。我们实现了两种聚合算法,联邦平均(FedAvg)和安全聚合(SecAgg),并将它们的性能与集中式ML模型训练进行比较。我们模拟异构环境,并探索人口统计学(性别、年龄和诊断)和不平衡数据分布的影响。模拟的异构环境使我们能够观察这些统计差异对使用FL训练的ML模型的影响,并突出在训练用于AD检测的ML模型时研究此类差异的重要性。此外,作为评估的一部分,我们通过模拟成员推理攻击证明了使用SecAgg的FL具有更高的隐私保障。