使用对偶对抗自动编码器生成连续的电子健康记录。

Generating sequential electronic health records using dual adversarial autoencoder.

机构信息

Department of Computer Science and Engineering, Pohang University of Science and Technology, Pohang, South Korea.

School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA.

出版信息

J Am Med Inform Assoc. 2020 Jul 1;27(9):1411-1419. doi: 10.1093/jamia/ocaa119.

DOI:10.1093/jamia/ocaa119

PMID:32989459

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7647348/

Abstract

OBJECTIVE

Recent studies on electronic health records (EHRs) started to learn deep generative models and synthesize a huge amount of realistic records, in order to address significant privacy issues surrounding the EHR. However, most of them only focus on structured records about patients' independent visits, rather than on chronological clinical records. In this article, we aim to learn and synthesize realistic sequences of EHRs based on the generative autoencoder.

MATERIALS AND METHODS

We propose a dual adversarial autoencoder (DAAE), which learns set-valued sequences of medical entities, by combining a recurrent autoencoder with 2 generative adversarial networks (GANs). DAAE improves the mode coverage and quality of generated sequences by adversarially learning both the continuous latent distribution and the discrete data distribution. Using the MIMIC-III (Medical Information Mart for Intensive Care-III) and UT Physicians clinical databases, we evaluated the performances of DAAE in terms of predictive modeling, plausibility, and privacy preservation.

RESULTS

Our generated sequences of EHRs showed the comparable performances to real data for a predictive modeling task, and achieved the best score in plausibility evaluation conducted by medical experts among all baseline models. In addition, differentially private optimization of our model enables to generate synthetic sequences without increasing the privacy leakage of patients' data.

CONCLUSIONS

DAAE can effectively synthesize sequential EHRs by addressing its main challenges: the synthetic records should be realistic enough not to be distinguished from the real records, and they should cover all the training patients to reproduce the performance of specific downstream tasks.

摘要

目的

最近的电子健康记录 (EHR) 研究开始学习深度生成模型，并综合大量现实记录，以解决围绕 EHR 的重大隐私问题。然而，它们大多只关注患者独立就诊的结构化记录，而不是按时间顺序排列的临床记录。在本文中，我们旨在基于生成式自动编码器学习和综合现实的 EHR 序列。

材料和方法

我们提出了一种双重对抗自动编码器 (DAAE)，它通过将循环自动编码器与 2 个生成式对抗网络 (GAN) 相结合，学习医学实体的集值序列。DAAE 通过对抗性学习连续潜在分布和离散数据分布，提高了生成序列的模式覆盖和质量。使用 MIMIC-III（重症监护医疗信息市场-III）和 UT 医生临床数据库，我们从预测建模、真实性和隐私保护的角度评估了 DAAE 的性能。

结果

我们生成的 EHR 序列在预测建模任务方面表现与真实数据相当，并且在所有基线模型中，在医学专家进行的真实性评估中获得了最佳得分。此外，我们模型的差分隐私优化可以生成合成序列，而不会增加患者数据的隐私泄露。

结论

DAAE 可以有效地综合顺序 EHR，解决其主要挑战：合成记录应足够真实，无法与真实记录区分开来，并且应涵盖所有训练患者，以重现特定下游任务的性能。

相似文献

Generating sequential electronic health records using dual adversarial autoencoder.使用对偶对抗自动编码器生成连续的电子健康记录。

J Am Med Inform Assoc. 2020 Jul 1;27(9):1411-1419. doi: 10.1093/jamia/ocaa119.

Synthesizing electronic health records using improved generative adversarial networks.使用改进的生成对抗网络合成电子健康记录。

J Am Med Inform Assoc. 2019 Mar 1;26(3):228-241. doi: 10.1093/jamia/ocy142.

Reliable generation of privacy-preserving synthetic electronic health record time series via diffusion models.通过扩散模型可靠地生成隐私保护的合成电子健康记录时间序列。

J Am Med Inform Assoc. 2024 Nov 1;31(11):2529-2539. doi: 10.1093/jamia/ocae229.

Tunable Privacy Risk Evaluation of Generative Adversarial Networks.生成式对抗网络的可调隐私风险评估。

Stud Health Technol Inform. 2024 Aug 22;316:1233-1237. doi: 10.3233/SHTI240634.

SynTEG: a framework for temporal structured electronic health data simulation.SynTEG：用于时间结构化电子健康数据模拟的框架。

J Am Med Inform Assoc. 2021 Mar 1;28(3):596-604. doi: 10.1093/jamia/ocaa262.

Ensuring electronic medical record simulation through better training, modeling, and evaluation.通过更好的培训、建模和评估来确保电子病历模拟。

J Am Med Inform Assoc. 2020 Jan 1;27(1):99-108. doi: 10.1093/jamia/ocz161.

Synthesizing time-series wound prognosis factors from electronic medical records using generative adversarial networks.使用生成对抗网络从电子病历中综合时间序列伤口预后因素。

J Biomed Inform. 2022 Jan;125:103972. doi: 10.1016/j.jbi.2021.103972. Epub 2021 Dec 14.

Lifelong Generative Adversarial Autoencoder.终身生成对抗自动编码器。

IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):14684-14698. doi: 10.1109/TNNLS.2023.3281091. Epub 2024 Oct 7.

Generating synthetic personal health data using conditional generative adversarial networks combining with differential privacy.使用条件生成对抗网络结合差分隐私生成合成个人健康数据。

J Biomed Inform. 2023 Jul;143:104404. doi: 10.1016/j.jbi.2023.104404. Epub 2023 Jun 1.

DeepFake electrocardiograms using generative adversarial networks are the beginning of the end for privacy issues in medicine.使用生成对抗网络的 DeepFake 心电图是医学隐私问题终结的开始。

Sci Rep. 2021 Nov 9;11(1):21896. doi: 10.1038/s41598-021-01295-2.

引用本文的文献

Autoencoder-Based Representation Learning for Similar Patients Retrieval From Electronic Health Records: Comparative Study.基于自动编码器的电子健康记录中相似患者检索的表示学习：比较研究

JMIR Med Inform. 2025 Jul 24;13:e68830. doi: 10.2196/68830.

Generating synthetic electronic health record data: a methodological scoping review with benchmarking on phenotype data and open-source software.生成合成电子健康记录数据：一项关于表型数据和开源软件基准测试的方法学范围综述

J Am Med Inform Assoc. 2025 Jul 1;32(7):1227-1240. doi: 10.1093/jamia/ocaf082.

SeqTrial: Utility Preserving Sequential Clinical Trial Data Generator.SeqTrial：实用程序保留顺序临床试验数据生成器。

AMIA Annu Symp Proc. 2025 May 22;2024:329-338. eCollection 2024.

A review on generative AI models for synthetic medical text, time series, and longitudinal data.关于用于合成医学文本、时间序列和纵向数据的生成式人工智能模型的综述。

NPJ Digit Med. 2025 May 15;8(1):281. doi: 10.1038/s41746-024-01409-w.

Engineering novel features for diabetes complication prediction using synthetic electronic health records.利用合成电子健康记录设计用于糖尿病并发症预测的新特征。

Front Genet. 2025 Apr 14;16:1451290. doi: 10.3389/fgene.2025.1451290. eCollection 2025.

PromptEHR: Conditional Electronic Healthcare Records Generation with Prompt Learning.PromptEHR：基于提示学习的条件式电子健康记录生成

Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022:2873-2885. doi: 10.18653/v1/2022.emnlp-main.185.

IGAMT: Privacy-Preserving Electronic Health Record Synthesization with Heterogeneity and Irregularity.IGAMT：具有异质性和不规则性的隐私保护电子健康记录合成

Proc AAAI Conf Artif Intell. 2024;38(14):15634-15643. doi: 10.1609/aaai.v38i14.29491. Epub 2024 Mar 24.

On the evaluation of synthetic longitudinal electronic health records.关于综合纵向电子健康记录的评估。

BMC Med Res Methodol. 2024 Aug 14;24(1):181. doi: 10.1186/s12874-024-02304-4.

Generative AI in Medical Practice: In-Depth Exploration of Privacy and Security Challenges.生成式人工智能在医疗实践中的应用：隐私与安全挑战的深入探讨。

J Med Internet Res. 2024 Mar 8;26:e53008. doi: 10.2196/53008.

Enabling Health Data Sharing with Fine-Grained Privacy.实现具有细粒度隐私的健康数据共享。

Proc ACM Int Conf Inf Knowl Manag. 2023 Oct;2023:131-141. doi: 10.1145/3583780.3614864. Epub 2023 Oct 21.

本文引用的文献

Ensuring electronic medical record simulation through better training, modeling, and evaluation.通过更好的培训、建模和评估来确保电子病历模拟。

J Am Med Inform Assoc. 2020 Jan 1;27(1):99-108. doi: 10.1093/jamia/ocz161.

Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing.隐私保护生成式深度神经网络支持临床数据共享。

Circ Cardiovasc Qual Outcomes. 2019 Jul;12(7):e005122. doi: 10.1161/CIRCOUTCOMES.118.005122. Epub 2019 Jul 9.

Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records.评估并降低源自医疗保健记录的研究数据中的重新识别风险。

EGEMS (Wash DC). 2019 Mar 29;7(1):6. doi: 10.5334/egems.270.

Synthesizing electronic health records using improved generative adversarial networks.使用改进的生成对抗网络合成电子健康记录。

J Am Med Inform Assoc. 2019 Mar 1;26(3):228-241. doi: 10.1093/jamia/ocy142.

Medical Image Synthesis with Context-Aware Generative Adversarial Networks.基于上下文感知生成对抗网络的医学图像合成

Med Image Comput Comput Assist Interv. 2017 Sep;10435:417-425. doi: 10.1007/978-3-319-66179-7_48. Epub 2017 Sep 4.

SegAN: Adversarial Network with Multi-scale L Loss for Medical Image Segmentation.SegAN: 用于医学图像分割的多尺度 L 损失对抗网络。

Neuroinformatics. 2018 Oct;16(3-4):383-392. doi: 10.1007/s12021-018-9377-x.

Doctor AI: Predicting Clinical Events via Recurrent Neural Networks.人工智能医生：通过循环神经网络预测临床事件

JMLR Workshop Conf Proc. 2016 Aug;56:301-318. Epub 2016 Dec 10.

Rationale-Augmented Convolutional Neural Networks for Text Classification.用于文本分类的基于原理增强的卷积神经网络。

Proc Conf Empir Methods Nat Lang Process. 2016 Nov;2016:795-804. doi: 10.18653/v1/d16-1076.

MIMIC-III, a freely accessible critical care database.MIMIC-III，一个免费获取的重症监护数据库。

Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.

Anonymising and sharing individual patient data.匿名化和共享个体患者数据。

BMJ. 2015 Mar 20;350:h1139. doi: 10.1136/bmj.h1139.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验