基于数据驱动的电子病历合成方法。

Data-driven approach for creating synthetic electronic medical records.

机构信息

Johns Hopkins University Applied Physics Laboratory, 11100 Johns Hopkins Rd, Laurel, MD 20723-6099, USA.

出版信息

BMC Med Inform Decis Mak. 2010 Oct 14;10:59. doi: 10.1186/1472-6947-10-59.

DOI:10.1186/1472-6947-10-59

PMID:20946670

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2972239/

Abstract

BACKGROUND

New algorithms for disease outbreak detection are being developed to take advantage of full electronic medical records (EMRs) that contain a wealth of patient information. However, due to privacy concerns, even anonymized EMRs cannot be shared among researchers, resulting in great difficulty in comparing the effectiveness of these algorithms. To bridge the gap between novel bio-surveillance algorithms operating on full EMRs and the lack of non-identifiable EMR data, a method for generating complete and synthetic EMRs was developed.

METHODS

This paper describes a novel methodology for generating complete synthetic EMRs both for an outbreak illness of interest (tularemia) and for background records. The method developed has three major steps: 1) synthetic patient identity and basic information generation; 2) identification of care patterns that the synthetic patients would receive based on the information present in real EMR data for similar health problems; 3) adaptation of these care patterns to the synthetic patient population.

RESULTS

We generated EMRs, including visit records, clinical activity, laboratory orders/results and radiology orders/results for 203 synthetic tularemia outbreak patients. Validation of the records by a medical expert revealed problems in 19% of the records; these were subsequently corrected. We also generated background EMRs for over 3000 patients in the 4-11 yr age group. Validation of those records by a medical expert revealed problems in fewer than 3% of these background patient EMRs and the errors were subsequently rectified.

CONCLUSIONS

A data-driven method was developed for generating fully synthetic EMRs. The method is general and can be applied to any data set that has similar data elements (such as laboratory and radiology orders and results, clinical activity, prescription orders). The pilot synthetic outbreak records were for tularemia but our approach may be adapted to other infectious diseases. The pilot synthetic background records were in the 4-11 year old age group. The adaptations that must be made to the algorithms to produce synthetic background EMRs for other age groups are indicated.

摘要

背景

新的疾病爆发检测算法正在被开发出来，以充分利用包含丰富患者信息的电子病历（EMR）。然而，由于隐私问题，即使是匿名的 EMR 也不能在研究人员之间共享，这导致了比较这些算法的有效性的巨大困难。为了弥合在完整 EMR 上运行的新型生物监测算法与缺乏不可识别的 EMR 数据之间的差距，开发了一种生成完整合成 EMR 的方法。

方法

本文描述了一种新颖的方法，用于生成感兴趣的爆发疾病（土拉热）和背景记录的完整合成 EMR。所开发的方法有三个主要步骤：1）合成患者身份和基本信息生成；2）根据真实 EMR 数据中类似健康问题的信息，确定合成患者将接受的护理模式；3）将这些护理模式应用于合成患者群体。

结果

我们生成了包括 203 名合成土拉热爆发患者的就诊记录、临床活动、实验室订单/结果和放射学订单/结果在内的 EMR。医学专家对记录的验证显示，19%的记录存在问题；这些问题随后得到了纠正。我们还为 4-11 岁年龄组的 3000 多名患者生成了背景 EMR。医学专家对这些背景患者 EMR 的验证显示，不到 3%的记录存在问题，随后这些错误得到了纠正。

结论

开发了一种数据驱动的生成完全合成 EMR 的方法。该方法具有通用性，可应用于具有类似数据元素（如实验室和放射学订单和结果、临床活动、处方订单）的任何数据集。试点合成爆发记录是针对土拉热的，但我们的方法可以适用于其他传染病。试点合成背景记录是在 4-11 岁年龄组。指出了为其他年龄组生成合成背景 EMR 时必须对算法进行的调整。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb14/2972239/f33eabb80c57/1472-6947-10-59-1.jpg

相似文献

Data-driven approach for creating synthetic electronic medical records.

BMC Med Inform Decis Mak. 2010 Oct 14;10:59. doi: 10.1186/1472-6947-10-59.

Construction and validation of synthetic electronic medical records.

Online J Public Health Inform. 2009;1(1). doi: 10.5210/ojphi.v1i1.2720. Epub 2009 Dec 10.

Gastrointestinal disease outbreak detection using multiple data streams from electronic medical records.

Foodborne Pathog Dis. 2012 May;9(5):431-41. doi: 10.1089/fpd.2011.1036. Epub 2012 Mar 19.

Extracting information from the text of electronic medical records to improve case detection: a systematic review.

J Am Med Inform Assoc. 2016 Sep;23(5):1007-15. doi: 10.1093/jamia/ocv180. Epub 2016 Feb 5.

A method for cohort selection of cardiovascular disease records from an electronic health record system.

Int J Med Inform. 2017 Jun;102:138-149. doi: 10.1016/j.ijmedinf.2017.03.015. Epub 2017 Mar 30.

[A customized method for information extraction from unstructured text data in the electronic medical records].

Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):256-263.

Commentary: The RIME/EMR scheme: an educational approach to clinical documentation in electronic medical records.

Acad Med. 2011 Jan;86(1):11-4. doi: 10.1097/ACM.0b013e3181ff7271.

Are electronic medical records helpful for care coordination? Experiences of physician practices.

J Gen Intern Med. 2010 Mar;25(3):177-85. doi: 10.1007/s11606-009-1195-2. Epub 2009 Dec 22.

Improving the Path from Diagnoses to Documentation: A Cognitive Review Tool for Clinical Notes and Administrative Records.

AMIA Annu Symp Proc. 2018 Dec 5;2018:518-526. eCollection 2018.

A Text Structuring Method for Chinese Medical Text Based on Temporal Information.

Int J Environ Res Public Health. 2018 Feb 27;15(3):402. doi: 10.3390/ijerph15030402.

引用本文的文献

Generating synthetic electronic health record data: a methodological scoping review with benchmarking on phenotype data and open-source software.

J Am Med Inform Assoc. 2025 Jul 1;32(7):1227-1240. doi: 10.1093/jamia/ocaf082.

Utility-based Analysis of Statistical Approaches and Deep Learning Models for Synthetic Data Generation With Focus on Correlation Structures: Algorithm Development and Validation.

JMIR AI. 2025 Mar 20;4:e65729. doi: 10.2196/65729.

PromptEHR: Conditional Electronic Healthcare Records Generation with Prompt Learning.

Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022:2873-2885. doi: 10.18653/v1/2022.emnlp-main.185.

Generation of a Realistic Synthetic Laryngeal Cancer Cohort for AI Applications.

Cancers (Basel). 2024 Feb 1;16(3):639. doi: 10.3390/cancers16030639.

Techniques to produce and evaluate realistic multivariate synthetic data.

Sci Rep. 2023 Jul 28;13(1):12266. doi: 10.1038/s41598-023-38832-0.

: A method for synthetic opinions to yield a robust fuzzy expert system.

MethodsX. 2023 Mar 8;10:102112. doi: 10.1016/j.mex.2023.102112. eCollection 2023.

A method for generating synthetic longitudinal health data.

BMC Med Res Methodol. 2023 Mar 23;23(1):67. doi: 10.1186/s12874-023-01869-w.

Conditional generation of medical time series for extrapolation to underrepresented populations.

PLOS Digit Health. 2022 Jul 19;1(7):e0000074. doi: 10.1371/journal.pdig.0000074. eCollection 2022 Jul.

Chronic Lymphocytic Leukemia Progression Diagnosis with Intrinsic Cellular Patterns via Unsupervised Clustering.

Cancers (Basel). 2022 May 13;14(10):2398. doi: 10.3390/cancers14102398.

Pretrained transformer framework on pediatric claims data for population specific tasks.

Sci Rep. 2022 Mar 7;12(1):3651. doi: 10.1038/s41598-022-07545-1.

本文引用的文献

Construction and validation of synthetic electronic medical records.

Online J Public Health Inform. 2009;1(1). doi: 10.5210/ojphi.v1i1.2720. Epub 2009 Dec 10.

Automated evaluation of electronic discharge notes to assess quality of care for cardiovascular diseases using Medical Language Extraction and Encoding System (MedLEE).

J Am Med Inform Assoc. 2010 May-Jun;17(3):245-52. doi: 10.1136/jamia.2009.000182.

Bayesian information fusion networks for biosurveillance applications.

J Am Med Inform Assoc. 2009 Nov-Dec;16(6):855-63. doi: 10.1197/jamia.M2647. Epub 2009 Aug 28.

A globally optimal k-anonymity method for the de-identification of health data.

J Am Med Inform Assoc. 2009 Sep-Oct;16(5):670-82. doi: 10.1197/jamia.M3144. Epub 2009 Jun 30.

Syndromic surveillance: STL for modeling, visualizing, and monitoring disease counts.

BMC Med Inform Decis Mak. 2009 Apr 21;9:21. doi: 10.1186/1472-6947-9-21.

Enhancing time-series detection algorithms for automated biosurveillance.

Emerg Infect Dis. 2009 Apr;15(4):533-9. doi: 10.3201/eid1504.080616.

Prediction of chronic obstructive pulmonary disease (COPD) in asthma patients using electronic medical records.

J Am Med Inform Assoc. 2009 May-Jun;16(3):371-9. doi: 10.1197/jamia.M2846. Epub 2009 Mar 4.

Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study.

J Am Med Inform Assoc. 2009 May-Jun;16(3):328-37. doi: 10.1197/jamia.M3028. Epub 2009 Mar 4.

Electronic medical record (EMR) utilization for public health surveillance.

AMIA Annu Symp Proc. 2008 Nov 6;2008:480-4.

Electronic Support for Public Health: validated case finding and reporting for notifiable diseases using electronic medical data.

J Am Med Inform Assoc. 2009 Jan-Feb;16(1):18-24. doi: 10.1197/jamia.M2848. Epub 2008 Oct 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于数据驱动的电子病历合成方法。

Data-driven approach for creating synthetic electronic medical records.

机构信息

Johns Hopkins University Applied Physics Laboratory, 11100 Johns Hopkins Rd, Laurel, MD 20723-6099, USA.