• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用生成对抗网络生成合成电子健康记录数据:教程

Generating Synthetic Electronic Health Record Data Using Generative Adversarial Networks: Tutorial.

作者信息

Yan Chao, Zhang Ziqi, Nyemba Steve, Li Zhuohang

机构信息

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States.

Department of Computer Science, Vanderbilt University, Nashville, TN, United States.

出版信息

JMIR AI. 2024 Apr 22;3:e52615. doi: 10.2196/52615.

DOI:10.2196/52615
PMID:38875595
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11074891/
Abstract

Synthetic electronic health record (EHR) data generation has been increasingly recognized as an important solution to expand the accessibility and maximize the value of private health data on a large scale. Recent advances in machine learning have facilitated more accurate modeling for complex and high-dimensional data, thereby greatly enhancing the data quality of synthetic EHR data. Among various approaches, generative adversarial networks (GANs) have become the main technical path in the literature due to their ability to capture the statistical characteristics of real data. However, there is a scarcity of detailed guidance within the domain regarding the development procedures of synthetic EHR data. The objective of this tutorial is to present a transparent and reproducible process for generating structured synthetic EHR data using a publicly accessible EHR data set as an example. We cover the topics of GAN architecture, EHR data types and representation, data preprocessing, GAN training, synthetic data generation and postprocessing, and data quality evaluation. We conclude this tutorial by discussing multiple important issues and future opportunities in this domain. The source code of the entire process has been made publicly available.

摘要

合成电子健康记录(EHR)数据生成日益被视为一种重要的解决方案,可大规模扩展私人健康数据的可访问性并最大化其价值。机器学习的最新进展促进了对复杂高维数据更精确的建模,从而极大提高了合成EHR数据的数据质量。在各种方法中,生成对抗网络(GAN)因其能够捕捉真实数据的统计特征而成为文献中的主要技术路径。然而,该领域内关于合成EHR数据开发程序的详细指导却很匮乏。本教程的目的是以一个可公开获取的EHR数据集为例,展示一个生成结构化合成EHR数据的透明且可重复的过程。我们涵盖了GAN架构、EHR数据类型与表示、数据预处理、GAN训练、合成数据生成与后处理以及数据质量评估等主题。我们通过讨论该领域的多个重要问题和未来机遇来结束本教程。整个过程的源代码已公开提供。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06e/11074891/52f5dc88b7f4/ai_v3i1e52615_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06e/11074891/7e5e11f16c2d/ai_v3i1e52615_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06e/11074891/4ffd1fed5735/ai_v3i1e52615_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06e/11074891/8da53fa23c79/ai_v3i1e52615_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06e/11074891/b193566b56b0/ai_v3i1e52615_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06e/11074891/52f5dc88b7f4/ai_v3i1e52615_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06e/11074891/7e5e11f16c2d/ai_v3i1e52615_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06e/11074891/4ffd1fed5735/ai_v3i1e52615_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06e/11074891/8da53fa23c79/ai_v3i1e52615_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06e/11074891/b193566b56b0/ai_v3i1e52615_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f06e/11074891/52f5dc88b7f4/ai_v3i1e52615_fig5.jpg

相似文献

1
Generating Synthetic Electronic Health Record Data Using Generative Adversarial Networks: Tutorial.使用生成对抗网络生成合成电子健康记录数据:教程
JMIR AI. 2024 Apr 22;3:e52615. doi: 10.2196/52615.
2
Generating sequential electronic health records using dual adversarial autoencoder.使用对偶对抗自动编码器生成连续的电子健康记录。
J Am Med Inform Assoc. 2020 Jul 1;27(9):1411-1419. doi: 10.1093/jamia/ocaa119.
3
Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications.为人工智能应用生成合成混合型纵向电子健康记录。
NPJ Digit Med. 2023 May 27;6(1):98. doi: 10.1038/s41746-023-00834-7.
4
Generative adversarial network based synthetic data training model for lightweight convolutional neural networks.用于轻量级卷积神经网络的基于生成对抗网络的合成数据训练模型。
Multimed Tools Appl. 2023 May 20:1-23. doi: 10.1007/s11042-023-15747-6.
5
Deepfakes in Ophthalmology: Applications and Realism of Synthetic Retinal Images from Generative Adversarial Networks.眼科中的深度伪造技术:生成对抗网络合成视网膜图像的应用与逼真度
Ophthalmol Sci. 2021 Nov 16;1(4):100079. doi: 10.1016/j.xops.2021.100079. eCollection 2021 Dec.
6
A joint learning method for incomplete and imbalanced data in electronic health record based on generative adversarial networks.基于生成对抗网络的电子健康记录中不完全和不平衡数据的联合学习方法。
Comput Biol Med. 2024 Jan;168:107687. doi: 10.1016/j.compbiomed.2023.107687. Epub 2023 Nov 14.
7
Anonymization Through Data Synthesis Using Generative Adversarial Networks (ADS-GAN).基于生成对抗网络的数据合成匿名化(ADS-GAN)。
IEEE J Biomed Health Inform. 2020 Aug;24(8):2378-2388. doi: 10.1109/JBHI.2020.2980262. Epub 2020 Mar 12.
8
Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks.基于生成对抗网络的合成流量型加密货币挖掘攻击生成。
Sci Rep. 2022 Feb 8;12(1):2091. doi: 10.1038/s41598-022-06057-2.
9
Keeping synthetic patients on track: feedback mechanisms to mitigate performance drift in longitudinal health data simulation.保持合成患者的轨迹:反馈机制以减轻纵向健康数据模拟中的性能漂移。
J Am Med Inform Assoc. 2022 Oct 7;29(11):1890-1898. doi: 10.1093/jamia/ocac131.
10
Enhancing classification of cells procured from bone marrow aspirate smears using generative adversarial networks and sequential convolutional neural network.利用生成对抗网络和序列卷积神经网络增强骨髓穿刺涂片获取的细胞分类。
Comput Methods Programs Biomed. 2022 Sep;224:107019. doi: 10.1016/j.cmpb.2022.107019. Epub 2022 Jul 10.

引用本文的文献

1
Framework for bias evaluation in large language models in healthcare settings.医疗环境中大型语言模型偏差评估框架。
NPJ Digit Med. 2025 Jul 7;8(1):414. doi: 10.1038/s41746-025-01786-w.
2
Rural Medical Centers Struggle to Produce Well-Calibrated Clinical Prediction Models: Data Augmentation Can Help.农村医疗中心难以生成校准良好的临床预测模型:数据增强可提供帮助。
medRxiv. 2025 Jun 17:2025.06.16.25329699. doi: 10.1101/2025.06.16.25329699.

本文引用的文献

1
Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications.为人工智能应用生成合成混合型纵向电子健康记录。
NPJ Digit Med. 2023 May 27;6(1):98. doi: 10.1038/s41746-023-00834-7.
2
Domain Knowledge-Driven Generation of Synthetic Healthcare Data.基于领域知识的合成医疗保健数据生成。
Stud Health Technol Inform. 2023 May 18;302:352-353. doi: 10.3233/SHTI230136.
3
Synthetic data in health care: A narrative review.医疗保健中的合成数据:一篇叙述性综述。
PLOS Digit Health. 2023 Jan 6;2(1):e0000082. doi: 10.1371/journal.pdig.0000082. eCollection 2023 Jan.
4
MIMIC-IV, a freely accessible electronic health record dataset.MIMIC-IV,一个可自由访问的电子健康记录数据集。
Sci Data. 2023 Jan 3;10(1):1. doi: 10.1038/s41597-022-01899-x.
5
A Multifaceted benchmarking of synthetic electronic health record generation models.综合电子健康记录生成模型的多方面基准测试。
Nat Commun. 2022 Dec 9;13(1):7609. doi: 10.1038/s41467-022-35295-1.
6
Keeping synthetic patients on track: feedback mechanisms to mitigate performance drift in longitudinal health data simulation.保持合成患者的轨迹:反馈机制以减轻纵向健康数据模拟中的性能漂移。
J Am Med Inform Assoc. 2022 Oct 7;29(11):1890-1898. doi: 10.1093/jamia/ocac131.
7
Synthetic patient data in health care: a widening legal loophole.医疗保健领域的合成患者数据:一个不断扩大的法律漏洞。
Lancet. 2022 Apr 23;399(10335):1601-1602. doi: 10.1016/S0140-6736(22)00232-X. Epub 2022 Mar 28.
8
Membership inference attacks against synthetic health data.针对合成健康数据的成员推理攻击。
J Biomed Inform. 2022 Jan;125:103977. doi: 10.1016/j.jbi.2021.103977. Epub 2021 Dec 14.
9
Evaluating the state of the art in missing data imputation for clinical data.评估临床数据缺失值插补的最新技术状态。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab489.
10
Addressing bias in big data and AI for health care: A call for open science.解决医疗保健领域大数据和人工智能中的偏见:呼吁开放科学。
Patterns (N Y). 2021 Oct 8;2(10):100347. doi: 10.1016/j.patter.2021.100347.