Suppr超能文献

合成电子健康记录-驱逐:利用大语言模型增强的合成电子健康记录数据改进驱逐相关健康社会决定因素的检测

SynthEHR-Eviction: Enhancing Eviction SDoH Detection with LLM-Augmented Synthetic EHR Data.

作者信息

Yao Zonghai, Zhao Youxia, Mitra Avijit, Levy David A, Druhl Emily, Tsai Jack, Yu Hong

出版信息

medRxiv. 2025 Jul 14:2025.07.10.25331237. doi: 10.1101/2025.07.10.25331237.

Abstract

Eviction is a significant yet understudied social determinants of health (SDoH), linked to housing instability, unemployment, and mental health. While eviction appears in unstructured electronic health records (EHRs), it is rarely coded in structured fields, limiting downstream applications. We introduce , a scalable pipeline combining LLMs, human-in-the-loop annotation, and automated prompt optimization (APO) to extract eviction statuses from clinical notes. Using this pipeline, we created the largest public eviction-related SDoH dataset to date, comprising 14 fine-grained categories. Fine-tuned LLMs (e.g., Qwen2.5, LLaMA3) trained on SynthEHR-Eviction achieved Macro-F1 scores of 88.8% (eviction) and 90.3% (other SDoH) on human validated data, outperforming GPT-4o-APO (87.8%, 87.3%), GPT-4o-mini-APO (69.1%, 78.1%), and BioBERT (60.7%, 68.3%), while enabling cost-effective deployment across various model sizes. The pipeline reduces annotation effort by over 80%, accelerates dataset creation, enables scalable eviction detection, and generalizes to other information extraction tasks.

摘要

驱逐是一个重要但研究不足的健康社会决定因素(SDoH),与住房不稳定、失业和心理健康有关。虽然驱逐情况出现在非结构化电子健康记录(EHR)中,但在结构化字段中很少被编码,这限制了下游应用。我们引入了一种可扩展的管道,该管道结合了大语言模型、人工参与标注和自动提示优化(APO),以从临床记录中提取驱逐状态。使用这个管道,我们创建了迄今为止最大的与驱逐相关的公共SDoH数据集,包含14个细粒度类别。在SynthEHR-Eviction上训练的微调大语言模型(如Qwen2.5、LLaMA3)在人工验证数据上的宏观F1分数分别为88.8%(驱逐)和90.3%(其他SDoH),优于GPT-4o-APO(87.8%,87.3%)、GPT-4o-mini-APO(69.1%,78.1%)和BioBERT(60.7%,68.3%),同时能够在各种模型规模上进行经济高效的部署。该管道将标注工作量减少了80%以上,加速了数据集创建,实现了可扩展的驱逐检测,并可推广到其他信息提取任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc29/12338912/c171615b0f47/nihpp-2025.07.10.25331237v1-f0005.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验