Lopez-Garcia Guillermo, Weissenbacher Davy, Stadler Matthew, O'Connor Karen, Xu Dongfang, Gryboski Lauren, Heavens Jared, Abu-El-Rub Noor, Mazzotti Diego R, Chakravorty Subhajit, Gonzalez-Hernandez Graciela
Department of Computational Biomedicine, Cedars-Sinai Medical Center, West Hollywood, CA.
Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA.
medRxiv. 2025 Jun 3:2025.06.02.25328701. doi: 10.1101/2025.06.02.25328701.
Insomnia is a highly prevalent but often underdiagnosed condition in clinical practice. Its inconsistent documentation in electronic health records (EHRs) limits population-level analyses and obstructs efforts to evaluate treatment patterns or outcomes. We present a novel, fully automated approach for phenotyping insomnia directly from unstructured clinical notes using generative large language models (LLMs). Leveraging prompt engineering with few-shot learning and chain-of-thought reasoning, we evaluated our system on two distinct corpora: inpatient clinical notes from MIMIC-III and outpatient primary care notes from the University of Kansas Health System (KUMC). Our models-Llama 70B and Llama 405B-achieved F1 scores of 93.0 on the MIMIC corpus and 85.7 on the KUMC corpus, substantially outperforming domain-adapted BERT-based classifiers. Ultimately, our framework offers a scalable and interpretable solution for clinical phenotyping of insomnia and can serve as a blueprint for similar efforts targeting other underdiagnosed or under-documented conditions in the EHR.
失眠是一种在临床实践中非常普遍但常常被漏诊的病症。其在电子健康记录(EHRs)中的记录不一致,限制了对人群层面的分析,并阻碍了评估治疗模式或结果的努力。我们提出了一种新颖的、完全自动化的方法,使用生成式大语言模型(LLMs)直接从非结构化临床记录中对失眠进行表型分析。通过利用少样本学习和思维链推理的提示工程,我们在两个不同的语料库上评估了我们的系统:来自MIMIC-III的住院临床记录和来自堪萨斯大学健康系统(KUMC)的门诊初级保健记录。我们的模型——Llama 70B和Llama 405B——在MIMIC语料库上的F1分数为93.0,在KUMC语料库上为85.7,显著优于基于领域适应的BERT分类器。最终,我们的框架为失眠的临床表型分析提供了一种可扩展且可解释的解决方案,并可作为针对EHR中其他未充分诊断或记录不足病症的类似努力的蓝图。