使用大语言模型从各机构的临床记录中提取健康的社会决定因素。

Social determinants of health extraction from clinical notes across institutions using large language models.

作者信息

Keloth Vipina K, Selek Salih, Chen Qingyu, Gilman Christopher, Fu Sunyang, Dang Yifang, Chen Xinghan, Hu Xinyue, Zhou Yujia, He Huan, Fan Jungwei W, Wang Karen, Brandt Cynthia, Tao Cui, Liu Hongfang, Xu Hua

机构信息

Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA.

Department of Psychiatry and Behavioral Sciences, UTHealth McGovern Medical School, Houston, TX, USA.

出版信息

NPJ Digit Med. 2025 May 17;8(1):287. doi: 10.1038/s41746-025-01645-8.

DOI:10.1038/s41746-025-01645-8

PMID:40379919

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12084648/

Abstract

Detailed social determinants of health (SDoH) is often buried within clinical text in EHRs. Most current NLP efforts for SDoH have limitations, investigating limited factors, deriving data from a single institution, using specific patient cohorts/note types, with reduced focus on generalizability. We aim to address these issues by creating cross-institutional corpora and developing and evaluating the generalizability of classification models, including large language models (LLMs), for detecting SDoH factors using data from four institutions. Clinical notes were annotated with 21 SDoH factors at two levels: level 1 (SDoH factors only) and level 2 (SDoH factors and associated values). Compared to other models, instruction tuned LLM achieved top performance with micro-averaged F1 over 0.9 on level 1 corpora and over 0.84 on level 2 corpora. While models performed well when trained and tested on individual datasets, cross-dataset generalization highlighted remaining obstacles. Access to trained models will be made available at https://github.com/BIDS-Xu-Lab/LLMs4SDoH .

摘要

详细的健康社会决定因素（SDoH）往往隐藏在电子健康记录（EHR）的临床文本中。目前大多数针对SDoH的自然语言处理（NLP）工作都存在局限性，比如研究的因素有限、从单一机构获取数据、使用特定的患者队列/笔记类型，并且对通用性的关注较少。我们旨在通过创建跨机构语料库以及开发和评估分类模型（包括大语言模型（LLM））的通用性来解决这些问题，这些模型使用来自四个机构的数据来检测SDoH因素。临床笔记在两个层面上用21个SDoH因素进行了标注：一级（仅SDoH因素）和二级（SDoH因素及相关值）。与其他模型相比，经过指令微调的LLM在一级语料库上的微平均F1值超过0.9，在二级语料库上超过0.84，表现最佳。虽然模型在单个数据集上进行训练和测试时表现良好，但跨数据集泛化凸显了仍然存在的障碍。可在https://github.com/BIDS-Xu-Lab/LLMs4SDoH获取经过训练的模型。