• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

电子健康记录的自然语言生成

Natural language generation for electronic health records.

作者信息

Lee Scott H

机构信息

Centers for Disease Control and Prevention, Atlanta, GA, USA.

出版信息

NPJ Digit Med. 2018 Nov 19;1:63. doi: 10.1038/s41746-018-0070-0. Print 2018.

DOI:10.1038/s41746-018-0070-0
PMID:30687797
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6345174/
Abstract

One broad goal of biomedical informatics is to generate fully-synthetic, faithfully representative electronic health records (EHRs) to facilitate data sharing between healthcare providers and researchers and promote methodological research. A variety of methods existing for generating synthetic EHRs, but they are not capable of generating unstructured text, like emergency department (ED) chief complaints, history of present illness, or progress notes. Here, we use the encoder-decoder model, a deep learning algorithm that features in many contemporary machine translation systems, to generate synthetic chief complaints from discrete variables in EHRs, like age group, gender, and discharge diagnosis. After being trained end-to-end on authentic records, the model can generate realistic chief complaint text that appears to preserve the epidemiological information encoded in the original record-sentence pairs. As a side effect of the model's optimization goal, these synthetic chief complaints are also free of relatively uncommon abbreviation and misspellings, and they include none of the personally identifiable information (PII) that was in the training data, suggesting that this model may be used to support the de-identification of text in EHRs. When combined with algorithms like generative adversarial networks (GANs), our model could be used to generate fully-synthetic EHRs, allowing healthcare providers to share faithful representations of multimodal medical data without compromising patient privacy. This is an important advance that we hope will facilitate the development of machine-learning methods for clinical decision support, disease surveillance, and other data-hungry applications in biomedical informatics.

摘要

生物医学信息学的一个广泛目标是生成完全合成的、具有忠实代表性的电子健康记录(EHR),以促进医疗保健提供者和研究人员之间的数据共享,并推动方法学研究。现有的多种方法可用于生成合成EHR,但它们无法生成非结构化文本,如急诊科(ED)主诉、现病史或病程记录。在此,我们使用编码器-解码器模型(一种在许多当代机器翻译系统中具有特色的深度学习算法),根据EHR中的离散变量(如年龄组、性别和出院诊断)生成合成主诉。在对真实记录进行端到端训练后,该模型可以生成逼真的主诉文本,这些文本似乎保留了原始记录-句子对中编码的流行病学信息。作为模型优化目标的一个附带效果,这些合成主诉也没有相对罕见的缩写和拼写错误,并且不包含训练数据中的任何个人身份信息(PII),这表明该模型可用于支持EHR中文本的去识别。当与生成对抗网络(GAN)等算法结合使用时,我们的模型可用于生成完全合成的EHR,使医疗保健提供者能够共享多模态医疗数据的忠实表示,而不会损害患者隐私。这是一项重要进展,我们希望它将促进用于临床决策支持、疾病监测以及生物医学信息学中其他数据需求大的应用的机器学习方法的发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cab/6550163/22a1e4ae4db2/41746_2018_70_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cab/6550163/22a1e4ae4db2/41746_2018_70_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cab/6550163/22a1e4ae4db2/41746_2018_70_Fig1_HTML.jpg

相似文献

1
Natural language generation for electronic health records.电子健康记录的自然语言生成
NPJ Digit Med. 2018 Nov 19;1:63. doi: 10.1038/s41746-018-0070-0. Print 2018.
2
Generating sequential electronic health records using dual adversarial autoencoder.使用对偶对抗自动编码器生成连续的电子健康记录。
J Am Med Inform Assoc. 2020 Jul 1;27(9):1411-1419. doi: 10.1093/jamia/ocaa119.
3
A method for cohort selection of cardiovascular disease records from an electronic health record system.一种从电子健康记录系统中选择心血管疾病记录队列的方法。
Int J Med Inform. 2017 Jun;102:138-149. doi: 10.1016/j.ijmedinf.2017.03.015. Epub 2017 Mar 30.
4
Generating contextual embeddings for emergency department chief complaints.为急诊科主要症状生成上下文嵌入。
JAMIA Open. 2020 Jul 15;3(2):160-166. doi: 10.1093/jamiaopen/ooaa022. eCollection 2020 Jul.
5
Deep Phenotyping of Chinese Electronic Health Records by Recognizing Linguistic Patterns of Phenotypic Narratives With a Sequence Motif Discovery Tool: Algorithm Development and Validation.利用序列基序发现工具识别表型叙述的语言模式对中国电子健康记录进行深度表型分析:算法开发与验证
J Med Internet Res. 2022 Jun 3;24(6):e37213. doi: 10.2196/37213.
6
PromptEHR: Conditional Electronic Healthcare Records Generation with Prompt Learning.PromptEHR:基于提示学习的条件式电子健康记录生成
Proc Conf Empir Methods Nat Lang Process. 2022 Dec;2022:2873-2885. doi: 10.18653/v1/2022.emnlp-main.185.
7
OpenDeID Pipeline for Unstructured Electronic Health Record Text Notes Based on Rules and Transformers: Deidentification Algorithm Development and Validation Study.基于规则和转换器的非结构化电子健康记录文本注释的 OpenDeID 管道:去识别算法的开发和验证研究。
J Med Internet Res. 2023 Dec 6;25:e48145. doi: 10.2196/48145.
8
Development and evaluation of a de-identification procedure for a case register sourced from mental health electronic records.开发和评估一种从精神健康电子记录来源的病例登记中去除识别信息的程序。
BMC Med Inform Decis Mak. 2013 Jul 11;13:71. doi: 10.1186/1472-6947-13-71.
9
Leveraging deep learning algorithms for synthetic data generation to design and analyze biological networks.利用深度学习算法进行合成数据生成,以设计和分析生物网络。
J Biosci. 2022;47.
10
Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs).为低收入和中等收入国家(LMICs)的预测模型整合电子健康记录
Biomedicines. 2023 Jun 18;11(6):1749. doi: 10.3390/biomedicines11061749.

引用本文的文献

1
Synthetic4Health: generating annotated synthetic clinical letters.合成4健康:生成带注释的合成临床信件。
Front Digit Health. 2025 May 30;7:1497130. doi: 10.3389/fdgth.2025.1497130. eCollection 2025.
2
A review on generative AI models for synthetic medical text, time series, and longitudinal data.关于用于合成医学文本、时间序列和纵向数据的生成式人工智能模型的综述。
NPJ Digit Med. 2025 May 15;8(1):281. doi: 10.1038/s41746-024-01409-w.
3
Evaluation and Bias Analysis of Large Language Models in Generating Synthetic Electronic Health Records: Comparative Study.

本文引用的文献

1
Chief complaint classification with recurrent neural networks.主诉分类的递归神经网络方法。
J Biomed Inform. 2019 May;93:103158. doi: 10.1016/j.jbi.2019.103158. Epub 2019 Mar 26.
2
Deaths from Falls Among Persons Aged ≥65 Years - United States, 2007-2016.2007-2016 年美国≥65 岁人群因跌倒导致的死亡人数。
MMWR Morb Mortal Wkly Rep. 2018 May 11;67(18):509-514. doi: 10.15585/mmwr.mm6718a1.
3
Evaluation of Syndromic Surveillance Systems in 6 US State and Local Health Departments.评估 6 个美国州和地方卫生部门的症状监测系统。
大语言模型生成合成电子健康记录的评估与偏差分析:比较研究
J Med Internet Res. 2025 May 12;27:e65317. doi: 10.2196/65317.
4
Synthetic data generation: a privacy-preserving approach to accelerate rare disease research.合成数据生成:一种加速罕见病研究的隐私保护方法。
Front Digit Health. 2025 Mar 18;7:1563991. doi: 10.3389/fdgth.2025.1563991. eCollection 2025.
5
Synthetic data generation methods in healthcare: A review on open-source tools and methods.医疗保健领域的合成数据生成方法:关于开源工具和方法的综述
Comput Struct Biotechnol J. 2024 Jul 9;23:2892-2910. doi: 10.1016/j.csbj.2024.07.005. eCollection 2024 Dec.
6
Neural Models for Generating Natural Language Summaries from Temporal Personal Health Data.用于从时间性个人健康数据生成自然语言摘要的神经模型。
J Healthc Inform Res. 2024 Jan 16;8(2):370-399. doi: 10.1007/s41666-023-00158-x. eCollection 2024 Jun.
7
AI-assisted literature exploration of innovative Chinese medicine formulas.人工智能辅助探索创新中药方剂的文献研究
Front Pharmacol. 2024 Mar 22;15:1347882. doi: 10.3389/fphar.2024.1347882. eCollection 2024.
8
Synthesize high-dimensional longitudinal electronic health records via hierarchical autoregressive language model.通过层次自回归语言模型合成高维纵向电子健康记录。
Nat Commun. 2023 Aug 31;14(1):5305. doi: 10.1038/s41467-023-41093-0.
9
ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model.ChatGPT 塑造牙科的未来:多模态大语言模型的潜力。
Int J Oral Sci. 2023 Jul 28;15(1):29. doi: 10.1038/s41368-023-00239-y.
10
Is artificial intelligence capable of generating hospital discharge summaries from inpatient records?人工智能能否根据住院记录生成医院出院小结?
PLOS Digit Health. 2022 Dec 12;1(12):e0000158. doi: 10.1371/journal.pdig.0000158. eCollection 2022 Dec.
J Public Health Manag Pract. 2018 May/Jun;24(3):235-240. doi: 10.1097/PHH.0000000000000679.
4
Advancing the Use of Emergency Department Syndromic Surveillance Data, New York City, 2012-2016.2012 - 2016年纽约市推进急诊科症状监测数据的应用
Public Health Rep. 2017 Jul/Aug;132(1_suppl):23S-30S. doi: 10.1177/0033354917711183.
5
CDC's Public Health Surveillance of Cancer.美国疾病控制与预防中心的癌症公共卫生监测。
Prev Chronic Dis. 2017 May 18;14:E39. doi: 10.5888/pcd14.160480.
6
Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge.展示与讲述:从 2015 年 MSCOCO 图像字幕挑战赛中学到的经验教训。
IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):652-663. doi: 10.1109/TPAMI.2016.2587640. Epub 2016 Jul 7.
7
Automated methods for the summarization of electronic health records.电子健康记录摘要的自动化方法。
J Am Med Inform Assoc. 2015 Sep;22(5):938-47. doi: 10.1093/jamia/ocv032. Epub 2015 Apr 15.
8
Using chief complaints for syndromic surveillance: a review of chief complaint based classifiers in North America.基于主要症状进行症候群监测:北美基于主要症状的分类器综述。
J Biomed Inform. 2013 Aug;46(4):734-43. doi: 10.1016/j.jbi.2013.04.003. Epub 2013 Apr 17.
9
Automatic generation of natural language nursing shift summaries in neonatal intensive care: BT-Nurse.自动生成新生儿重症监护病房护理交接班自然语言总结:BT-Nurse。
Artif Intell Med. 2012 Nov;56(3):157-72. doi: 10.1016/j.artmed.2012.09.002. Epub 2012 Oct 12.
10
Framewise phoneme classification with bidirectional LSTM and other neural network architectures.使用双向长短期记忆网络和其他神经网络架构进行逐帧音素分类。
Neural Netw. 2005 Jun-Jul;18(5-6):602-10. doi: 10.1016/j.neunet.2005.06.042.