Suppr超能文献

EHR-Safe:生成高保真且保护隐私的合成电子健康记录。

EHR-Safe: generating high-fidelity and privacy-preserving synthetic electronic health records.

作者信息

Yoon Jinsung, Mizrahi Michel, Ghalaty Nahid Farhady, Jarvinen Thomas, Ravi Ashwin S, Brune Peter, Kong Fanyu, Anderson Dave, Lee George, Meir Arie, Bandukwala Farhana, Kanal Elli, Arık Sercan Ö, Pfister Tomas

机构信息

Google Cloud, 1155 Borregas Ave, Sunnyvale, CA, USA.

Google LLC, 1600 Amphitheatre Pkwy, Mountain View, CA, USA.

出版信息

NPJ Digit Med. 2023 Aug 11;6(1):141. doi: 10.1038/s41746-023-00888-7.

Abstract

Privacy concerns often arise as the key bottleneck for the sharing of data between consumers and data holders, particularly for sensitive data such as Electronic Health Records (EHR). This impedes the application of data analytics and ML-based innovations with tremendous potential. One promising approach for such privacy concerns is to instead use synthetic data. We propose a generative modeling framework, EHR-Safe, for generating highly realistic and privacy-preserving synthetic EHR data. EHR-Safe is based on a two-stage model that consists of sequential encoder-decoder networks and generative adversarial networks. Our innovations focus on the key challenging aspects of real-world EHR data: heterogeneity, sparsity, coexistence of numerical and categorical features with distinct characteristics, and time-varying features with highly-varying sequence lengths. Under numerous evaluations, we demonstrate that the fidelity of EHR-Safe is almost-identical with real data (<3% accuracy difference for the models trained on them) while yielding almost-ideal performance in practical privacy metrics.

摘要

隐私问题常常成为消费者与数据持有者之间数据共享的关键瓶颈,尤其是对于电子健康记录(EHR)等敏感数据而言。这阻碍了具有巨大潜力的数据分析和基于机器学习的创新应用。解决此类隐私问题的一种有前景的方法是使用合成数据。我们提出了一个生成建模框架EHR-Safe,用于生成高度逼真且保护隐私的合成EHR数据。EHR-Safe基于一个两阶段模型,该模型由顺序编码器-解码器网络和生成对抗网络组成。我们的创新聚焦于现实世界EHR数据的关键挑战方面:异质性、稀疏性、具有不同特征的数值和分类特征的共存,以及具有高度可变序列长度的时变特征。在大量评估中,我们证明EHR-Safe的逼真度与真实数据几乎相同(在基于它们训练的模型中准确率差异小于3%),同时在实际隐私指标方面表现几乎理想。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8935/10421926/11e8192d97ec/41746_2023_888_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验