Suppr超能文献

用于建模电子健康记录的隐私保护生成对抗网络。

Privacy preserving Generative Adversarial Networks to model Electronic Health Records.

机构信息

School of Computer Science and Electronic Engineering, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, United Kingdom.

School of Computer Science and Electronic Engineering, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, United Kingdom.

出版信息

Neural Netw. 2022 Sep;153:339-348. doi: 10.1016/j.neunet.2022.06.022. Epub 2022 Jun 25.

Abstract

Hospitals and General Practitioner (GP) surgeries within National Health Services (NHS), collect patient information on a routine basis to create personal health records such as family medical history, chronic diseases, medications and dosing. The collected information could be used to build and model various machine learning algorithms, to simplify the task of those working within the NHS. However, such Electronic Health Records are not made publicly available due to privacy concerns. In our paper, we propose a privacy-preserving Generative Adversarial Network (pGAN), which can generate synthetic data of high quality, while preserving the privacy and statistical properties of the source data. pGAN is evaluated on two distinct datasets, one posing as a Classification task, and the other as a Regression task. Privacy score of generated data is calculated using the Nearest Neighbour Adversarial Accuracy. Cosine similarity scores of synthetic data from our proposed model indicate that the data generated is similar in nature, but not identical. Additionally, our proposed model was able to preserve privacy while maintaining high utility. Machine learning models trained on both synthetic data and original data have achieved accuracies of 74.3% and 74.5% respectively on the classification dataset; while they have attained an R2-Score of 0.84 and 0.85 on synthetic and original data of the regression task respectively. Our results, therefore, indicate that synthetic data from the proposed model could replace the use of original data for machine learning while preserving privacy.

摘要

英国国民保健制度(NHS)下的医院和全科医生(GP)诊所会定期收集患者信息,以创建个人健康记录,如家族病史、慢性病、药物和剂量等。这些收集到的信息可用于构建和模拟各种机器学习算法,以简化 NHS 内部工作人员的工作。但是,由于隐私问题,这些电子健康记录并未公开。在我们的论文中,我们提出了一种隐私保护生成对抗网络(pGAN),它可以生成高质量的合成数据,同时保护源数据的隐私和统计特性。pGAN 在两个不同的数据集上进行了评估,一个数据集用于分类任务,另一个数据集用于回归任务。使用最近邻对抗精度计算生成数据的隐私得分。我们提出的模型生成的合成数据的余弦相似性得分表明,生成的数据在性质上相似,但并不完全相同。此外,我们的模型在保持高实用性的同时还能够保护隐私。在分类数据集上,基于合成数据和原始数据训练的机器学习模型的准确率分别达到了 74.3%和 74.5%;而在回归任务的合成数据和原始数据上,它们的 R2 得分分别达到了 0.84 和 0.85。因此,我们的结果表明,所提出模型的合成数据可以替代原始数据用于机器学习,同时保护隐私。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验