Suppr超能文献

通过更好的培训、建模和评估来确保电子病历模拟。

Ensuring electronic medical record simulation through better training, modeling, and evaluation.

机构信息

Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, USA.

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.

出版信息

J Am Med Inform Assoc. 2020 Jan 1;27(1):99-108. doi: 10.1093/jamia/ocz161.

Abstract

OBJECTIVE

Electronic medical records (EMRs) can support medical research and discovery, but privacy risks limit the sharing of such data on a wide scale. Various approaches have been developed to mitigate risk, including record simulation via generative adversarial networks (GANs). While showing promise in certain application domains, GANs lack a principled approach for EMR data that induces subpar simulation. In this article, we improve EMR simulation through a novel pipeline that (1) enhances the learning model, (2) incorporates evaluation criteria for data utility that informs learning, and (3) refines the training process.

MATERIALS AND METHODS

We propose a new electronic health record generator using a GAN with a Wasserstein divergence and layer normalization techniques. We designed 2 utility measures to characterize similarity in the structural properties of real and simulated EMRs in the original and latent space, respectively. We applied a filtering strategy to enhance GAN training for low-prevalence clinical concepts. We evaluated the new and existing GANs with utility and privacy measures (membership and disclosure attacks) using billing codes from over 1 million EMRs at Vanderbilt University Medical Center.

RESULTS

The proposed model outperformed the state-of-the-art approaches with significant improvement in retaining the nature of real records, including prediction performance and structural properties, without sacrificing privacy. Additionally, the filtering strategy achieved higher utility when the EMR training dataset was small.

CONCLUSIONS

These findings illustrate that EMR simulation through GANs can be substantially improved through more appropriate training, modeling, and evaluation criteria.

摘要

目的:电子病历(EMR)可以支持医学研究和发现,但隐私风险限制了此类数据的广泛共享。已经开发了各种方法来降低风险,包括通过生成对抗网络(GAN)进行记录模拟。虽然在某些应用领域有一定的前景,但 GAN 缺乏一种针对 EMR 数据的原则性方法,无法实现较差的模拟效果。在本文中,我们通过一种新的流水线来改进 EMR 模拟,该流水线(1)增强学习模型,(2)纳入数据效用评估标准,以指导学习,(3)改进训练过程。

材料与方法:我们提出了一种使用带有 Wasserstein 分歧和层归一化技术的 GAN 的新型电子健康记录生成器。我们设计了 2 种效用度量标准,分别用于在原始和潜在空间中描述真实和模拟 EMR 结构属性的相似性。我们应用了一种过滤策略来增强低患病率临床概念的 GAN 训练。我们使用来自范德比尔特大学医学中心超过 100 万份 EMR 的计费代码,使用效用和隐私度量(成员和披露攻击)评估了新的和现有的 GAN。

结果:与现有最先进的方法相比,所提出的模型在保留真实记录的性质方面表现出色,包括预测性能和结构属性,同时不牺牲隐私。此外,当 EMR 训练数据集较小时,过滤策略可以实现更高的效用。

结论:这些发现表明,通过更合适的训练、建模和评估标准,通过 GAN 进行 EMR 模拟可以得到实质性的改进。

相似文献

7
Anonymization Through Data Synthesis Using Generative Adversarial Networks (ADS-GAN).基于生成对抗网络的数据合成匿名化(ADS-GAN)。
IEEE J Biomed Health Inform. 2020 Aug;24(8):2378-2388. doi: 10.1109/JBHI.2020.2980262. Epub 2020 Mar 12.
8
Tunable Privacy Risk Evaluation of Generative Adversarial Networks.生成式对抗网络的可调隐私风险评估。
Stud Health Technol Inform. 2024 Aug 22;316:1233-1237. doi: 10.3233/SHTI240634.
10
CTAB-GAN+: enhancing tabular data synthesis.CTAB-GAN+:增强表格数据合成
Front Big Data. 2024 Jan 8;6:1296508. doi: 10.3389/fdata.2023.1296508. eCollection 2023.

引用本文的文献

4
Clinical Research Informatics: a Decade-in-Review.临床研究信息学:十年回顾
Yearb Med Inform. 2024 Aug;33(1):127-142. doi: 10.1055/s-0044-1800732. Epub 2025 Apr 8.
6
PromptEHR: Conditional Electronic Healthcare Records Generation with Prompt Learning.PromptEHR:基于提示学习的条件式电子健康记录生成
Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022:2873-2885. doi: 10.18653/v1/2022.emnlp-main.185.

本文引用的文献

10
Machine learning: Trends, perspectives, and prospects.机器学习:趋势、观点和展望。
Science. 2015 Jul 17;349(6245):255-60. doi: 10.1126/science.aaa8415.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验