从他人身上学习而不牺牲隐私：移动健康数据集中式和联邦机器学习的模拟比较。

Learning From Others Without Sacrificing Privacy: Simulation Comparing Centralized and Federated Machine Learning on Mobile Health Data.

机构信息

Department of Statistics, University of Michigan, Ann Arbor, MI, United States.

Molecular and Behavioral Neuroscience Institute, University of Michigan, Ann Arbor, MI, United States.

出版信息

JMIR Mhealth Uhealth. 2021 Mar 30;9(3):e23728. doi: 10.2196/23728.

DOI:10.2196/23728

PMID:33783362

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8044739/

Abstract

BACKGROUND

The use of wearables facilitates data collection at a previously unobtainable scale, enabling the construction of complex predictive models with the potential to improve health. However, the highly personal nature of these data requires strong privacy protection against data breaches and the use of data in a way that users do not intend. One method to protect user privacy while taking advantage of sharing data across users is federated learning, a technique that allows a machine learning model to be trained using data from all users while only storing a user's data on that user's device. By keeping data on users' devices, federated learning protects users' private data from data leaks and breaches on the researcher's central server and provides users with more control over how and when their data are used. However, there are few rigorous studies on the effectiveness of federated learning in the mobile health (mHealth) domain.

OBJECTIVE

We review federated learning and assess whether it can be useful in the mHealth field, especially for addressing common mHealth challenges such as privacy concerns and user heterogeneity. The aims of this study are to describe federated learning in an mHealth context, apply a simulation of federated learning to an mHealth data set, and compare the performance of federated learning with the performance of other predictive models.

METHODS

We applied a simulation of federated learning to predict the affective state of 15 subjects using physiological and motion data collected from a chest-worn device for approximately 36 minutes. We compared the results from this federated model with those from a centralized or server model and with the results from training individual models for each subject.

RESULTS

In a 3-class classification problem using physiological and motion data to predict whether the subject was undertaking a neutral, amusing, or stressful task, the federated model achieved 92.8% accuracy on average, the server model achieved 93.2% accuracy on average, and the individual model achieved 90.2% accuracy on average.

CONCLUSIONS

Our findings support the potential for using federated learning in mHealth. The results showed that the federated model performed better than a model trained separately on each individual and nearly as well as the server model. As federated learning offers more privacy than a server model, it may be a valuable option for designing sensitive data collection methods.

摘要

背景

可穿戴设备的使用使得以前无法获得的数据能够以可获得的规模进行收集，从而构建具有改善健康潜力的复杂预测模型。然而，这些数据的高度个性化要求对数据泄露和数据的使用方式进行强有力的隐私保护，使用户的意图不被数据的使用方式所影响。在利用用户之间的数据共享的同时保护用户隐私的一种方法是联邦学习，这是一种允许使用所有用户的数据来训练机器学习模型的技术，而只在用户的设备上存储用户的数据。通过将数据保留在用户的设备上，联邦学习保护了用户的私有数据免受研究人员中央服务器的数据泄露和攻击，并使用户对其数据的使用方式和时间有更多的控制。然而，在移动健康 (mHealth) 领域，联邦学习的有效性的严格研究很少。