Suppr超能文献

在急诊科环境中使用大语言模型识别电子健康记录中的监禁状态。

Identifying incarceration status in the electronic health record using large language models in emergency department settings.

作者信息

Huang Thomas, Socrates Vimig, Gilson Aidan, Safranek Conrad, Chi Ling, Wang Emily A, Puglisi Lisa B, Brandt Cynthia, Taylor R Andrew, Wang Karen

机构信息

Department of Emergency Medicine, Yale School of Medicine, New Haven, CT, USA.

Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, CT, USA.

出版信息

J Clin Transl Sci. 2024 Mar 11;8(1):e53. doi: 10.1017/cts.2024.496. eCollection 2024.

Abstract

BACKGROUND

Incarceration is a significant social determinant of health, contributing to high morbidity, mortality, and racialized health inequities. However, incarceration status is largely invisible to health services research due to inadequate clinical electronic health record (EHR) capture. This study aims to develop, train, and validate natural language processing (NLP) techniques to more effectively identify incarceration status in the EHR.

METHODS

The study population consisted of adult patients (≥ 18 y.o.) who presented to the emergency department between June 2013 and August 2021. The EHR database was filtered for notes for specific incarceration-related terms, and then a random selection of 1,000 notes was annotated for incarceration and further stratified into specific statuses of prior history, recent, and current incarceration. For NLP model development, 80% of the notes were used to train the Longformer-based and RoBERTa algorithms. The remaining 20% of the notes underwent analysis with GPT-4.

RESULTS

There were 849 unique patients across 989 visits in the 1000 annotated notes. Manual annotation revealed that 559 of 1000 notes (55.9%) contained evidence of incarceration history. ICD-10 code (sensitivity: 4.8%, specificity: 99.1%, F1-score: 0.09) demonstrated inferior performance to RoBERTa NLP (sensitivity: 78.6%, specificity: 73.3%, F1-score: 0.79), Longformer NLP (sensitivity: 94.6%, specificity: 87.5%, F1-score: 0.93), and GPT-4 (sensitivity: 100%, specificity: 61.1%, F1-score: 0.86).

CONCLUSIONS

Our advanced NLP models demonstrate a high degree of accuracy in identifying incarceration status from clinical notes. Further research is needed to explore their scaled implementation in population health initiatives and assess their potential to mitigate health disparities through tailored system interventions.

摘要

背景

监禁是健康的一个重要社会决定因素,导致高发病率、死亡率以及种族化的健康不平等。然而,由于临床电子健康记录(EHR)捕获不足,监禁状态在很大程度上在卫生服务研究中难以被发现。本研究旨在开发、训练和验证自然语言处理(NLP)技术,以更有效地在电子健康记录中识别监禁状态。

方法

研究人群包括2013年6月至2021年8月期间到急诊科就诊的成年患者(≥18岁)。对电子健康记录数据库进行筛选,查找与特定监禁相关术语的记录,然后随机选择1000条记录标注监禁情况,并进一步细分为既往史、近期和当前监禁的具体状态。对于自然语言处理模型开发,80%的记录用于训练基于Longformer和RoBERTa的算法。其余20%的记录使用GPT-4进行分析。

结果

在1000条标注记录的989次就诊中,有849名不同患者。人工标注显示,1000条记录中有559条(55.9%)包含监禁史证据。国际疾病分类第十版(ICD-10)编码(敏感性:4.8%,特异性:99.1%,F1分数:0.09)的表现不如RoBERTa自然语言处理(敏感性:78.6%,特异性:73.3%,F1分数:0.79)、Longformer自然语言处理(敏感性:94.6%,特异性:87.5%,F1分数:0.93)和GPT-4(敏感性:100%,特异性:61.1%,F1分数:0.86)。

结论

我们先进的自然语言处理模型在从临床记录中识别监禁状态方面显示出高度准确性。需要进一步研究以探索它们在人群健康倡议中的规模化实施,并评估它们通过量身定制的系统干预减轻健康差距的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/908b/10966832/efcdd9067305/S2059866124004965_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验