• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于从临床记录中提取健康信息社会决定因素的大语言模型——一种适用于各机构的通用方法。

Large Language Models for Social Determinants of Health Information Extraction from Clinical Notes - A Generalizable Approach across Institutions.

作者信息

Keloth Vipina K, Selek Salih, Chen Qingyu, Gilman Christopher, Fu Sunyang, Dang Yifang, Chen Xinghan, Hu Xinyue, Zhou Yujia, He Huan, Fan Jungwei W, Wang Karen, Brandt Cynthia, Tao Cui, Liu Hongfang, Xu Hua

机构信息

Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA.

Department of Psychiatry and Behavioral Sciences, UTHealth McGovern Medical School, Houston, TX, USA.

出版信息

medRxiv. 2024 May 22:2024.05.21.24307726. doi: 10.1101/2024.05.21.24307726.

DOI:10.1101/2024.05.21.24307726
PMID:38826441
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11142292/
Abstract

The consistent and persuasive evidence illustrating the influence of social determinants on health has prompted a growing realization throughout the health care sector that enhancing health and health equity will likely depend, at least to some extent, on addressing detrimental social determinants. However, detailed social determinants of health (SDoH) information is often buried within clinical narrative text in electronic health records (EHRs), necessitating natural language processing (NLP) methods to automatically extract these details. Most current NLP efforts for SDoH extraction have been limited, investigating on limited types of SDoH elements, deriving data from a single institution, focusing on specific patient cohorts or note types, with reduced focus on generalizability. This study aims to address these issues by creating cross-institutional corpora spanning different note types and healthcare systems, and developing and evaluating the generalizability of classification models, including novel large language models (LLMs), for detecting SDoH factors from diverse types of notes from four institutions: Harris County Psychiatric Center, University of Texas Physician Practice, Beth Israel Deaconess Medical Center, and Mayo Clinic. Four corpora of deidentified clinical notes were annotated with 21 SDoH factors at two levels: level 1 with SDoH factor types only and level 2 with SDoH factors along with associated values. Three traditional classification algorithms (XGBoost, TextCNN, Sentence BERT) and an instruction tuned LLM-based approach (LLaMA) were developed to identify multiple SDoH factors. Substantial variation was noted in SDoH documentation practices and label distributions based on patient cohorts, note types, and hospitals. The LLM achieved top performance with micro-averaged F1 scores over 0.9 on level 1 annotated corpora and an F1 over 0.84 on level 2 annotated corpora. While models performed well when trained and tested on individual datasets, cross-dataset generalization highlighted remaining obstacles. To foster collaboration, access to partial annotated corpora and models trained by merging all annotated datasets will be made available on the PhysioNet repository.

摘要

有说服力的一致证据表明社会决定因素对健康有影响,这促使整个医疗保健行业越来越意识到,改善健康和健康公平性至少在一定程度上可能取决于解决有害的社会决定因素。然而,详细的健康社会决定因素(SDoH)信息往往隐藏在电子健康记录(EHR)的临床叙述文本中,因此需要自然语言处理(NLP)方法来自动提取这些细节。目前大多数用于提取SDoH的NLP工作都很有限,只研究有限类型的SDoH元素,从单一机构获取数据,专注于特定患者群体或笔记类型,而对通用性的关注较少。本研究旨在通过创建跨不同笔记类型和医疗系统的跨机构语料库,以及开发和评估分类模型(包括新型大语言模型(LLM))的通用性来解决这些问题,这些模型用于从哈里斯县精神病中心、德克萨斯大学医师实践中心、贝斯以色列女执事医疗中心和梅奥诊所这四个机构的不同类型笔记中检测SDoH因素。四个去识别化临床笔记语料库用21个SDoH因素在两个级别上进行了注释:一级仅标注SDoH因素类型,二级标注SDoH因素及其相关值。开发了三种传统分类算法(XGBoost、TextCNN、Sentence BERT)和一种基于指令微调的基于LLM的方法(LLaMA)来识别多个SDoH因素。基于患者群体、笔记类型和医院,在SDoH文档实践和标签分布方面存在显著差异。LLM在一级注释语料库上的微平均F1分数超过0.9,在二级注释语料库上的F1分数超过0.84,表现最佳。虽然模型在单个数据集上进行训练和测试时表现良好,但跨数据集泛化突出了仍然存在的障碍。为了促进合作,将在PhysioNet存储库上提供对部分注释语料库和通过合并所有注释数据集训练的模型的访问权限。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9db/11142292/f859a53ce9e9/nihpp-2024.05.21.24307726v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9db/11142292/73376f47bf9d/nihpp-2024.05.21.24307726v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9db/11142292/93ca9d5ac2aa/nihpp-2024.05.21.24307726v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9db/11142292/53d676dcf91b/nihpp-2024.05.21.24307726v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9db/11142292/f859a53ce9e9/nihpp-2024.05.21.24307726v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9db/11142292/73376f47bf9d/nihpp-2024.05.21.24307726v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9db/11142292/93ca9d5ac2aa/nihpp-2024.05.21.24307726v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9db/11142292/53d676dcf91b/nihpp-2024.05.21.24307726v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9db/11142292/f859a53ce9e9/nihpp-2024.05.21.24307726v1-f0004.jpg

相似文献

1
Large Language Models for Social Determinants of Health Information Extraction from Clinical Notes - A Generalizable Approach across Institutions.用于从临床记录中提取健康信息社会决定因素的大语言模型——一种适用于各机构的通用方法。
medRxiv. 2024 May 22:2024.05.21.24307726. doi: 10.1101/2024.05.21.24307726.
2
Social determinants of health extraction from clinical notes across institutions using large language models.使用大语言模型从各机构的临床记录中提取健康的社会决定因素。
NPJ Digit Med. 2025 May 17;8(1):287. doi: 10.1038/s41746-025-01645-8.
3
Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.利用基于深度学习的自然语言处理技术从非结构化电子健康记录中分类社会健康决定因素。
J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.
4
Identifying social determinants of health from clinical narratives: A study of performance, documentation ratio, and potential bias.从临床叙述中识别健康的社会决定因素:一项关于表现、记录比例和潜在偏差的研究。
J Biomed Inform. 2024 May;153:104642. doi: 10.1016/j.jbi.2024.104642. Epub 2024 Apr 14.
5
Scalable information extraction from free text electronic health records using large language models.使用大语言模型从自由文本电子健康记录中进行可扩展的信息提取。
BMC Med Res Methodol. 2025 Jan 28;25(1):23. doi: 10.1186/s12874-025-02470-z.
6
Extracting social determinants of health events with transformer-based multitask, multilabel named entity recognition.基于转换器的多任务、多标签命名实体识别技术提取健康事件的社会决定因素。
J Am Med Inform Assoc. 2023 Jul 19;30(8):1379-1388. doi: 10.1093/jamia/ocad046.
7
Natural language processing to identify social determinants of health in Alzheimer's disease and related dementia from electronic health records.基于自然语言处理的电子健康记录中阿尔茨海默病及相关痴呆症社会决定因素的识别。
Health Serv Res. 2023 Dec;58(6):1292-1302. doi: 10.1111/1475-6773.14210. Epub 2023 Aug 3.
8
Extracting social determinants of health from clinical note text with classification and sequence-to-sequence approaches.使用分类和序列到序列方法从临床记录文本中提取健康的社会决定因素。
J Am Med Inform Assoc. 2023 Jul 19;30(8):1448-1455. doi: 10.1093/jamia/ocad071.
9
Barriers and Facilitators of Obtaining Social Determinants of Health of Patients With Cancer Through the Electronic Health Record Using Natural Language Processing Technology: Qualitative Feasibility Study With Stakeholder Interviews.使用自然语言处理技术通过电子健康记录获取癌症患者健康的社会决定因素的障碍与促进因素:利益相关者访谈的定性可行性研究
JMIR Form Res. 2022 Dec 27;6(12):e43059. doi: 10.2196/43059.
10
A marker-based neural network system for extracting social determinants of health.基于标记的神经网络系统,用于提取健康的社会决定因素。
J Am Med Inform Assoc. 2023 Jul 19;30(8):1398-1407. doi: 10.1093/jamia/ocad041.

本文引用的文献

1
Advancing entity recognition in biomedicine via instruction tuning of large language models.通过指令调整大型语言模型推进生物医学中的实体识别。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae163.
2
Social and Behavioral Determinants of Health in the Era of Artificial Intelligence with Electronic Health Records: A Scoping Review.人工智能与电子健康记录时代健康的社会和行为决定因素:一项范围综述
Health Data Sci. 2021 Aug 24;2021:9759016. doi: 10.34133/2021/9759016. eCollection 2021.
3
Improving large language models for clinical named entity recognition via prompt engineering.
通过提示工程改进临床命名实体识别的大型语言模型。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1812-1820. doi: 10.1093/jamia/ocad259.
4
Integrating Commercial and Social Determinants of Health: A Unified Ontology for Non-Clinical Determinants of Health.整合商业和健康的社会决定因素:非临床健康决定因素的统一本体论。
AMIA Annu Symp Proc. 2024 Jan 11;2023:446-455. eCollection 2023.
5
Large language models to identify social determinants of health in electronic health records.利用大语言模型识别电子健康记录中的健康社会决定因素。
NPJ Digit Med. 2024 Jan 11;7(1):6. doi: 10.1038/s41746-023-00970-0.
6
Systematic design and data-driven evaluation of social determinants of health ontology (SDoHO).系统设计和数据驱动的健康决定因素本体(SDoHO)评估。
J Am Med Inform Assoc. 2023 Aug 18;30(9):1465-1473. doi: 10.1093/jamia/ocad096.
7
Automatic extraction of social determinants of health from medical notes of chronic lower back pain patients.从慢性下背痛患者的病历中自动提取健康的社会决定因素。
J Am Med Inform Assoc. 2023 Jul 19;30(8):1438-1447. doi: 10.1093/jamia/ocad054.
8
Representing and utilizing clinical textual data for real world studies: An OHDSI approach.用于真实世界研究的临床文本数据表示和利用:OHDSI 方法。
J Biomed Inform. 2023 Jun;142:104343. doi: 10.1016/j.jbi.2023.104343. Epub 2023 Mar 17.
9
The 2022 n2c2/UW shared task on extracting social determinants of health.2022 年 n2c2/UW 关于提取健康社会决定因素的共享任务。
J Am Med Inform Assoc. 2023 Jul 19;30(8):1367-1378. doi: 10.1093/jamia/ocad012.
10
Social isolation is linked to classical risk factors of Alzheimer's disease-related dementias.社交孤立与阿尔茨海默病相关痴呆的经典风险因素有关。
PLoS One. 2023 Feb 1;18(2):e0280471. doi: 10.1371/journal.pone.0280471. eCollection 2023.