• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

为急诊科主要症状生成上下文嵌入。

Generating contextual embeddings for emergency department chief complaints.

作者信息

Chang David, Hong Woo Suk, Taylor Richard Andrew

机构信息

Computational Biology and Bioinformatics Program, Yale University, New Haven, Connecticut, USA.

Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA.

出版信息

JAMIA Open. 2020 Jul 15;3(2):160-166. doi: 10.1093/jamiaopen/ooaa022. eCollection 2020 Jul.

DOI:10.1093/jamiaopen/ooaa022
PMID:32734154
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7382638/
Abstract

OBJECTIVE

We learn contextual embeddings for emergency department (ED) chief complaints using Bidirectional Encoder Representations from Transformers (BERT), a state-of-the-art language model, to derive a compact and computationally useful representation for free-text chief complaints.

MATERIALS AND METHODS

Retrospective data on 2.1 million adult and pediatric ED visits was obtained from a large healthcare system covering the period of March 2013 to July 2019. A total of 355 497 (16.4%) visits from 65 737 (8.9%) patients were removed for absence of either a structured or unstructured chief complaint. To ensure adequate training set size, chief complaint labels that comprised less than 0.01%, or 1 in 10 000, of all visits were excluded. The cutoff threshold was incremented on a log scale to create seven datasets of decreasing sparsity. The classification task was to predict the provider-assigned label from the free-text chief complaint using BERT, with Long Short-Term Memory (LSTM) and Embeddings from Language Models (ELMo) as baselines. Performance was measured as the Top-k accuracy from = 1:5 on a hold-out test set comprising 5% of the samples. The embedding for each free-text chief complaint was extracted as the final 768-dimensional layer of the BERT model and visualized using t-distributed stochastic neighbor embedding (t-SNE).

RESULTS

The models achieved increasing performance with datasets of decreasing sparsity, with BERT outperforming both LSTM and ELMo. The BERT model yielded Top-1 accuracies of 0.65 and 0.69, Top-3 accuracies of 0.87 and 0.90, and Top-5 accuracies of 0.92 and 0.94 on datasets comprised of 434 and 188 labels, respectively. Visualization using t-SNE mapped the learned embeddings in a clinically meaningful way, with related concepts embedded close to each other and broader types of chief complaints clustered together.

DISCUSSION

Despite the inherent noise in the chief complaint label space, the model was able to learn a rich representation of chief complaints and generate reasonable predictions of their labels. The learned embeddings accurately predict provider-assigned chief complaint labels and map semantically similar chief complaints to nearby points in vector space.

CONCLUSION

Such a model may be used to automatically map free-text chief complaints to structured fields and to assist the development of a standardized, data-driven ontology of chief complaints for healthcare institutions.

摘要

目的

我们使用最先进的语言模型——来自变换器的双向编码器表征(BERT)来学习急诊科(ED)主诉的上下文嵌入,以便为自由文本形式的主诉得出一种紧凑且计算上实用的表征。

材料与方法

从一个大型医疗系统获取了2013年3月至2019年7月期间210万例成人和儿科急诊就诊的回顾性数据。由于缺乏结构化或非结构化的主诉,共排除了来自65737名(8.9%)患者的355497次(16.4%)就诊。为确保有足够的训练集规模,排除了占所有就诊次数不到0.01%(即万分之一)的主诉标签。截止阈值以对数尺度递增,以创建七个稀疏度递减的数据集。分类任务是使用BERT从自由文本主诉中预测提供者分配的标签,以长短期记忆(LSTM)和语言模型嵌入(ELMo)作为基线。在包含5%样本的留出测试集上,性能以k=1:5时的前k准确率来衡量。每个自由文本主诉的嵌入被提取为BERT模型的最后768维层,并使用t分布随机邻域嵌入(t-SNE)进行可视化。

结果

随着稀疏度递减的数据集,模型的性能不断提高,BERT的表现优于LSTM和ELMo。在分别由434个和188个标签组成的数据集上,BERT模型的前1准确率分别为0.65和0.69,前3准确率分别为0.87和0.90,前5准确率分别为0.92和0.94。使用t-SNE进行的可视化以一种具有临床意义的方式映射了学习到的嵌入,相关概念彼此靠近嵌入,更广泛类型的主诉聚集在一起。

讨论

尽管主诉标签空间中存在固有噪声,但该模型能够学习到丰富的主诉表征,并对其标签做出合理预测。学习到的嵌入能够准确预测提供者分配的主诉标签,并将语义相似的主诉映射到向量空间中的附近点。

结论

这样的模型可用于自动将自由文本主诉映射到结构化字段,并有助于为医疗机构开发标准化的、数据驱动的主诉本体。

相似文献

1
Generating contextual embeddings for emergency department chief complaints.为急诊科主要症状生成上下文嵌入。
JAMIA Open. 2020 Jul 15;3(2):160-166. doi: 10.1093/jamiaopen/ooaa022. eCollection 2020 Jul.
2
Identifying the Perceived Severity of Patient-Generated Telemedical Queries Regarding COVID: Developing and Evaluating a Transfer Learning-Based Solution.识别患者生成的关于新冠病毒的远程医疗查询的感知严重程度:开发和评估基于迁移学习的解决方案。
JMIR Med Inform. 2022 Sep 2;10(9):e37770. doi: 10.2196/37770.
3
RadioBERT: A deep learning-based system for medical report generation from chest X-ray images using contextual embeddings.RadioBERT:一种基于深度学习的系统,用于使用上下文嵌入从胸部 X 光图像生成医学报告。
J Biomed Inform. 2022 Nov;135:104220. doi: 10.1016/j.jbi.2022.104220. Epub 2022 Oct 10.
4
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
5
A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.深度学习模型在不同类别不平衡程度的非结构化医疗记录文本分类中的对比研究。
BMC Med Res Methodol. 2022 Jul 2;22(1):181. doi: 10.1186/s12874-022-01665-y.
6
Disease Concept-Embedding Based on the Self-Supervised Method for Medical Information Extraction from Electronic Health Records and Disease Retrieval: Algorithm Development and Validation Study.基于自监督方法的疾病概念嵌入在电子健康记录中的医学信息提取和疾病检索:算法开发和验证研究。
J Med Internet Res. 2021 Jan 27;23(1):e25113. doi: 10.2196/25113.
7
An Automated Toxicity Classification on Social Media Using LSTM and Word Embedding.基于 LSTM 和词嵌入的社交媒体自动毒性分类。
Comput Intell Neurosci. 2022 Feb 15;2022:8467349. doi: 10.1155/2022/8467349. eCollection 2022.
8
When BERT meets Bilbo: a learning curve analysis of pretrained language model on disease classification.当 BERT 遇见比尔博:预训练语言模型在疾病分类上的学习曲线分析。
BMC Med Inform Decis Mak. 2022 Apr 5;21(Suppl 9):377. doi: 10.1186/s12911-022-01829-2.
9
Modified Bidirectional Encoder Representations From Transformers Extractive Summarization Model for Hospital Information Systems Based on Character-Level Tokens (AlphaBERT): Development and Performance Evaluation.基于字符级令牌的医院信息系统变压器抽取式摘要模型(AlphaBERT)的改进双向编码器表示:开发与性能评估
JMIR Med Inform. 2020 Apr 29;8(4):e17787. doi: 10.2196/17787.
10
An efficient method for disaster tweets classification using gradient-based optimized convolutional neural networks with BERT embeddings.一种使用基于梯度优化的卷积神经网络与BERT嵌入的高效灾难推文分类方法。
MethodsX. 2024 Jul 3;13:102843. doi: 10.1016/j.mex.2024.102843. eCollection 2024 Dec.

引用本文的文献

1
A large language model based pipeline for extracting information from patient complaint and anamnesis in clinical notes for severity assessment.一种基于大语言模型的管道,用于从临床记录中的患者主诉和病史中提取信息以进行严重程度评估。
Sci Rep. 2025 Jul 14;15(1):25345. doi: 10.1038/s41598-025-07649-4.
2
Predicting sepsis treatment decisions in the paediatric emergency department using machine learning: the AiSEPTRON study.利用机器学习预测儿科急诊科的脓毒症治疗决策:AiSEPTRON研究
BMJ Paediatr Open. 2025 May 14;9(1):e003273. doi: 10.1136/bmjpo-2024-003273.
3
Exploring diagnostic stewardship in the emergency department evaluation of pediatric abdominal pain in a statewide quality collaborative.

本文引用的文献

1
Predicting 72-hour and 9-day return to the emergency department using machine learning.使用机器学习预测72小时和9天内返回急诊科的情况。
JAMIA Open. 2019 Jul 1;2(3):346-352. doi: 10.1093/jamiaopen/ooz019. eCollection 2019 Oct.
2
Improving documentation of presenting problems in the emergency department using a domain-specific ontology and machine learning-driven user interfaces.利用领域特定本体和机器学习驱动的用户界面改进急诊科就诊问题的文档记录。
Int J Med Inform. 2019 Dec;132:103981. doi: 10.1016/j.ijmedinf.2019.103981. Epub 2019 Sep 27.
3
Prediction of emergency department patient disposition based on natural language processing of triage notes.
在一项全州范围的质量协作中,探索急诊科对小儿腹痛评估中的诊断管理。
Acad Emerg Med. 2025 Mar;32(3):309-319. doi: 10.1111/acem.15075. Epub 2025 Jan 5.
4
BERT-based natural language processing analysis of French CT reports: Application to the measurement of the positivity rate for pulmonary embolism.基于BERT的法语CT报告自然语言处理分析:在肺栓塞阳性率测量中的应用
Res Diagn Interv Imaging. 2023 Mar 27;6:100027. doi: 10.1016/j.redii.2023.100027. eCollection 2023 Jun.
5
The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.大型语言模型在变革急诊医学中的作用:范围综述
JMIR Med Inform. 2024 May 10;12:e53787. doi: 10.2196/53787.
6
Language model and its interpretability in biomedicine: A scoping review.语言模型及其在生物医学中的可解释性:一项范围综述。
iScience. 2024 Feb 24;27(4):109334. doi: 10.1016/j.isci.2024.109334. eCollection 2024 Apr 19.
7
Applications of natural language processing at emergency department triage: A narrative review.自然语言处理在急诊科分诊中的应用:叙事性综述。
PLoS One. 2023 Dec 14;18(12):e0279953. doi: 10.1371/journal.pone.0279953. eCollection 2023.
8
Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review.使用机器学习方法进行自然语言处理,以分析来自电子健康记录的非结构化患者报告结局:系统评价。
Artif Intell Med. 2023 Dec;146:102701. doi: 10.1016/j.artmed.2023.102701. Epub 2023 Nov 1.
9
Sampling and ranking spatial transcriptomics data embeddings to identify tissue architecture.对空间转录组学数据嵌入进行采样和排序以识别组织结构。
Front Genet. 2022 Aug 12;13:912813. doi: 10.3389/fgene.2022.912813. eCollection 2022.
基于分诊记录的自然语言处理预测急诊科患者去向。
Int J Med Inform. 2019 Sep;129:184-188. doi: 10.1016/j.ijmedinf.2019.06.008. Epub 2019 Jun 13.
4
Enhancing clinical concept extraction with contextual embeddings.利用上下文嵌入增强临床概念提取。
J Am Med Inform Assoc. 2019 Nov 1;26(11):1297-1304. doi: 10.1093/jamia/ocz096.
5
Consensus Development of a Modern Ontology of Emergency Department Presenting Problems-The Hierarchical Presenting Problem Ontology (HaPPy).急诊就诊问题现代本体论共识的发展——分层就诊问题本体论(HaPPy)。
Appl Clin Inform. 2019 May;10(3):409-420. doi: 10.1055/s-0039-1691842. Epub 2019 Jun 12.
6
Learning Contextual Hierarchical Structure of Medical Concepts with Poincairé Embeddings to Clarify Phenotypes.利用庞加莱嵌入学习医学概念的上下文层次结构以阐明表型。
Pac Symp Biocomput. 2019;24:8-17.
7
EHR phenotyping via jointly embedding medical concepts and words into a unified vector space.通过将医疗概念和词汇联合嵌入到统一的向量空间中进行 EHR 表型分析。
BMC Med Inform Decis Mak. 2018 Dec 12;18(Suppl 4):123. doi: 10.1186/s12911-018-0672-0.
8
Predicting hospital admission at emergency department triage using machine learning.运用机器学习预测急诊科分诊时的住院情况。
PLoS One. 2018 Jul 20;13(7):e0201016. doi: 10.1371/journal.pone.0201016. eCollection 2018.
9
Readmission prediction via deep contextual embedding of clinical concepts.基于临床概念的深度上下文嵌入的再入院预测。
PLoS One. 2018 Apr 9;13(4):e0195024. doi: 10.1371/journal.pone.0195024. eCollection 2018.
10
Joint Learning of Representations of Medical Concepts and Words from EHR Data.基于电子健康记录数据的医学概念与词汇表示的联合学习
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2017 Nov;2017:764-769. doi: 10.1109/BIBM.2017.8217752. Epub 2017 Dec 18.