从他人身上学习而不牺牲隐私:移动健康数据集中式和联邦机器学习的模拟比较。
Learning From Others Without Sacrificing Privacy: Simulation Comparing Centralized and Federated Machine Learning on Mobile Health Data.
机构信息
Department of Statistics, University of Michigan, Ann Arbor, MI, United States.
Molecular and Behavioral Neuroscience Institute, University of Michigan, Ann Arbor, MI, United States.
出版信息
JMIR Mhealth Uhealth. 2021 Mar 30;9(3):e23728. doi: 10.2196/23728.
BACKGROUND
The use of wearables facilitates data collection at a previously unobtainable scale, enabling the construction of complex predictive models with the potential to improve health. However, the highly personal nature of these data requires strong privacy protection against data breaches and the use of data in a way that users do not intend. One method to protect user privacy while taking advantage of sharing data across users is federated learning, a technique that allows a machine learning model to be trained using data from all users while only storing a user's data on that user's device. By keeping data on users' devices, federated learning protects users' private data from data leaks and breaches on the researcher's central server and provides users with more control over how and when their data are used. However, there are few rigorous studies on the effectiveness of federated learning in the mobile health (mHealth) domain.
OBJECTIVE
We review federated learning and assess whether it can be useful in the mHealth field, especially for addressing common mHealth challenges such as privacy concerns and user heterogeneity. The aims of this study are to describe federated learning in an mHealth context, apply a simulation of federated learning to an mHealth data set, and compare the performance of federated learning with the performance of other predictive models.
METHODS
We applied a simulation of federated learning to predict the affective state of 15 subjects using physiological and motion data collected from a chest-worn device for approximately 36 minutes. We compared the results from this federated model with those from a centralized or server model and with the results from training individual models for each subject.
RESULTS
In a 3-class classification problem using physiological and motion data to predict whether the subject was undertaking a neutral, amusing, or stressful task, the federated model achieved 92.8% accuracy on average, the server model achieved 93.2% accuracy on average, and the individual model achieved 90.2% accuracy on average.
CONCLUSIONS
Our findings support the potential for using federated learning in mHealth. The results showed that the federated model performed better than a model trained separately on each individual and nearly as well as the server model. As federated learning offers more privacy than a server model, it may be a valuable option for designing sensitive data collection methods.
背景
可穿戴设备的使用使得以前无法获得的数据能够以可获得的规模进行收集,从而构建具有改善健康潜力的复杂预测模型。然而,这些数据的高度个性化要求对数据泄露和数据的使用方式进行强有力的隐私保护,使用户的意图不被数据的使用方式所影响。在利用用户之间的数据共享的同时保护用户隐私的一种方法是联邦学习,这是一种允许使用所有用户的数据来训练机器学习模型的技术,而只在用户的设备上存储用户的数据。通过将数据保留在用户的设备上,联邦学习保护了用户的私有数据免受研究人员中央服务器的数据泄露和攻击,并使用户对其数据的使用方式和时间有更多的控制。然而,在移动健康 (mHealth) 领域,联邦学习的有效性的严格研究很少。
目的
我们回顾了联邦学习,并评估了它在 mHealth 领域是否有用,特别是在解决隐私问题和用户异质性等常见 mHealth 挑战方面。本研究的目的是在 mHealth 背景下描述联邦学习,应用联邦学习的模拟来处理 mHealth 数据集,并将联邦学习的性能与其他预测模型的性能进行比较。
方法
我们应用了联邦学习的模拟,使用从佩戴在胸部的设备收集的生理和运动数据来预测 15 名受试者的情感状态,模拟时间约为 36 分钟。我们将这个联邦模型的结果与集中式或服务器模型的结果以及为每个受试者训练单独模型的结果进行了比较。
结果
在使用生理和运动数据来预测受试者是在进行中性、有趣还是有压力的任务的 3 类分类问题中,联邦模型的平均准确率为 92.8%,服务器模型的平均准确率为 93.2%,单独模型的平均准确率为 90.2%。
结论
我们的研究结果支持在 mHealth 中使用联邦学习的潜力。结果表明,联邦模型的表现优于针对每个个体单独训练的模型,与服务器模型的表现几乎一样好。由于联邦学习比服务器模型提供了更多的隐私保护,因此它可能是设计敏感数据收集方法的一个有价值的选择。