基于大语言模型的健康社会决定因素识别分类器的开发与验证

On the development and validation of large language model-based classifiers for identifying social determinants of health.

作者信息

Gabriel Rodney A, Litake Onkar, Simpson Sierra, Burton Brittany N, Waterman Ruth S, Macias Alvaro A

机构信息

Division of Perioperative Informatics, Department of Anesthesiology, University of California, San Diego, La Jolla, CA 92037.

Department of Biomedical Informatics, University of California, San Diego Health, La Jolla, CA 92037.

出版信息

Proc Natl Acad Sci U S A. 2024 Sep 24;121(39):e2320716121. doi: 10.1073/pnas.2320716121. Epub 2024 Sep 16.

DOI:10.1073/pnas.2320716121

PMID:39284061

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11441499/

Abstract

The assessment of social determinants of health (SDoH) within healthcare systems is crucial for comprehensive patient care and addressing health disparities. Current challenges arise from the limited inclusion of structured SDoH information within electronic health record (EHR) systems, often due to the lack of standardized diagnosis codes. This study delves into the transformative potential of large language models (LLM) to overcome these challenges. LLM-based classifiers-using Bidirectional Encoder Representations from Transformers (BERT) and A Robustly Optimized BERT Pretraining Approach (RoBERTa)-were developed for SDoH concepts, including homelessness, food insecurity, and domestic violence, using synthetic training datasets generated by generative pre-trained transformers combined with authentic clinical notes. Models were then validated on separate datasets: Medical Information Mart for Intensive Care-III and our institutional EHR data. When training the model with a combination of synthetic and authentic notes, validation on our institutional dataset yielded an area under the receiver operating characteristics curve of 0.78 for detecting homelessness, 0.72 for detecting food insecurity, and 0.83 for detecting domestic violence. This study underscores the potential of LLMs in extracting SDoH information from clinical text. Automated detection of SDoH may be instrumental for healthcare providers in identifying at-risk patients, guiding targeted interventions, and contributing to population health initiatives aimed at mitigating disparities.

摘要

在医疗保健系统中评估健康的社会决定因素（SDoH）对于全面的患者护理和解决健康差距至关重要。当前的挑战源于电子健康记录（EHR）系统中结构化SDoH信息的纳入有限，这通常是由于缺乏标准化诊断代码所致。本研究深入探讨了大语言模型（LLM）克服这些挑战的变革潜力。基于LLM的分类器——使用来自Transformer的双向编码器表示（BERT）和一种经过稳健优化的BERT预训练方法（RoBERTa）——针对包括无家可归、粮食不安全和家庭暴力在内的SDoH概念而开发，使用了由生成式预训练Transformer生成的合成训练数据集并结合真实临床记录。然后在单独的数据集上对模型进行验证：重症监护医学信息集市-III和我们机构的EHR数据。当使用合成记录和真实记录的组合训练模型时，在我们的机构数据集上进行验证，检测无家可归的受试者工作特征曲线下面积为0.78，检测粮食不安全的为0.72，检测家庭暴力的为0.83。本研究强调了LLM在从临床文本中提取SDoH信息方面的潜力。SDoH的自动检测可能有助于医疗保健提供者识别高危患者、指导有针对性的干预措施，并为旨在减少差距的人群健康倡议做出贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c67/11441499/1bf5ed7d5f0a/pnas.2320716121fig01.jpg

相似文献

On the development and validation of large language model-based classifiers for identifying social determinants of health.基于大语言模型的健康社会决定因素识别分类器的开发与验证

Proc Natl Acad Sci U S A. 2024 Sep 24;121(39):e2320716121. doi: 10.1073/pnas.2320716121. Epub 2024 Sep 16.

Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.利用基于深度学习的自然语言处理技术从非结构化电子健康记录中分类社会健康决定因素。

J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.

A Large Language Model Screening Tool to Target Patients for Best Practice Alerts: Development and Validation.一种用于针对患者发送最佳实践警报的大语言模型筛查工具：开发与验证

JMIR Med Inform. 2023 Nov 27;11:e49886. doi: 10.2196/49886.

Scalable information extraction from free text electronic health records using large language models.使用大语言模型从自由文本电子健康记录中进行可扩展的信息提取。

BMC Med Res Methodol. 2025 Jan 28;25(1):23. doi: 10.1186/s12874-025-02470-z.

Using Large Language Models to Abstract Complex Social Determinants of Health From Original and Deidentified Medical Notes: Development and Validation Study.利用大型语言模型从原始和去识别的医疗记录中提取复杂的健康社会决定因素摘要：开发和验证研究。

J Med Internet Res. 2024 Nov 19;26:e63445. doi: 10.2196/63445.

Large Language Models for Social Determinants of Health Information Extraction from Clinical Notes - A Generalizable Approach across Institutions.用于从临床记录中提取健康信息社会决定因素的大语言模型——一种适用于各机构的通用方法。

medRxiv. 2024 May 22:2024.05.21.24307726. doi: 10.1101/2024.05.21.24307726.

Leveraging natural language processing to augment structured social determinants of health data in the electronic health record.利用自然语言处理技术增强电子健康记录中的结构化社会决定因素健康数据。

J Am Med Inform Assoc. 2023 Jul 19;30(8):1389-1397. doi: 10.1093/jamia/ocad073.

Measuring the Value of a Practical Text Mining Approach to Identify Patients With Housing Issues in the Free-Text Notes in Electronic Health Record: Findings of a Retrospective Cohort Study.衡量实用文本挖掘方法在电子健康记录中的自由文本记录中识别住房问题患者的价值：一项回顾性队列研究的结果。

Front Public Health. 2021 Aug 27;9:697501. doi: 10.3389/fpubh.2021.697501. eCollection 2021.

Autonomous International Classification of Diseases Coding Using Pretrained Language Models and Advanced Prompt Learning Techniques: Evaluation of an Automated Analysis System Using Medical Text.使用预训练语言模型和先进提示学习技术的自主国际疾病分类编码：对一个使用医学文本的自动分析系统的评估

JMIR Med Inform. 2025 Jan 6;13:e63020. doi: 10.2196/63020.

Evaluation of a Natural Language Processing Approach to Identify Social Determinants of Health in Electronic Health Records in a Diverse Community Cohort.评估一种自然语言处理方法，以识别不同人群队列电子健康记录中的健康社会决定因素。

Med Care. 2022 Mar 1;60(3):248-255. doi: 10.1097/MLR.0000000000001683.

引用本文的文献

Research progress and implications of the application of large language model in shared decision-making in China's healthcare field.大语言模型在中国医疗领域共享决策应用中的研究进展与启示

Front Public Health. 2025 Jul 10;13:1605212. doi: 10.3389/fpubh.2025.1605212. eCollection 2025.

Social determinants of health extraction from clinical notes across institutions using large language models.使用大语言模型从各机构的临床记录中提取健康的社会决定因素。

NPJ Digit Med. 2025 May 17;8(1):287. doi: 10.1038/s41746-025-01645-8.

Reply to Wang: Improving large language model approaches for identifying social determinants of health from clinical notes.回复王：改进从临床记录中识别健康社会决定因素的大语言模型方法。

Proc Natl Acad Sci U S A. 2025 Apr;122(13):e2503187122. doi: 10.1073/pnas.2503187122. Epub 2025 Mar 20.

Large language models for identifying social determinants of health.用于识别健康的社会决定因素的大语言模型。

Proc Natl Acad Sci U S A. 2025 Apr;122(13):e2501506122. doi: 10.1073/pnas.2501506122. Epub 2025 Mar 20.

SBDH-Reader: an LLM-powered method for extracting social and behavioral determinants of health from medical notes.SBDH阅读器：一种由大型语言模型驱动的从医疗记录中提取健康的社会和行为决定因素的方法。

medRxiv. 2025 Feb 21:2025.02.19.25322576. doi: 10.1101/2025.02.19.25322576.

本文引用的文献

Social determinants of health in patients undergoing hemiarthroplasty: are they associated with medical complications, healthcare utilization, and payments for care?髋关节置换术后患者的健康社会决定因素：它们与医疗并发症、医疗保健利用和医疗费用有关吗？

Arch Orthop Trauma Surg. 2023 Dec;143(12):7073-7080. doi: 10.1007/s00402-023-05045-z. Epub 2023 Sep 11.

Social Determinants of Health Disparities Increase the Rate of Complications After Total Knee Arthroplasty.社会决定因素导致健康差异，增加全膝关节置换术后并发症发生率。

J Arthroplasty. 2023 Dec;38(12):2531-2536.e3. doi: 10.1016/j.arth.2023.08.077. Epub 2023 Aug 31.

Social Determinants of Health Documentation in Structured and Unstructured Clinical Data of Patients With Diabetes: Comparative Analysis.糖尿病患者结构化和非结构化临床数据中的健康记录社会决定因素：比较分析

JMIR Med Inform. 2023 Aug 22;11:e46159. doi: 10.2196/46159.

Natural language processing to identify social determinants of health in Alzheimer's disease and related dementia from electronic health records.基于自然语言处理的电子健康记录中阿尔茨海默病及相关痴呆症社会决定因素的识别。

Health Serv Res. 2023 Dec;58(6):1292-1302. doi: 10.1111/1475-6773.14210. Epub 2023 Aug 3.

How large language models can augment perioperative medicine: a daring discourse.大语言模型如何增强围手术期医学：一次大胆的讨论。

Reg Anesth Pain Med. 2023 Nov;48(11):575-577. doi: 10.1136/rapm-2023-104637. Epub 2023 Jun 19.

A large language model for electronic health records.用于电子健康记录的大型语言模型。

NPJ Digit Med. 2022 Dec 26;5(1):194. doi: 10.1038/s41746-022-00742-2.

J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.

Extracting social determinants of health from electronic health records using natural language processing: a systematic review.利用自然语言处理从电子健康记录中提取健康的社会决定因素：系统评价。

J Am Med Inform Assoc. 2021 Nov 25;28(12):2716-2727. doi: 10.1093/jamia/ocab170.

Identification of social determinants of health using multi-label classification of electronic health record clinical notes.利用电子健康记录临床笔记的多标签分类识别健康的社会决定因素。

JAMIA Open. 2021 Feb 9;4(3):ooaa069. doi: 10.1093/jamiaopen/ooaa069. eCollection 2021 Jul.

Annotating social determinants of health using active learning, and characterizing determinants using neural event extraction.使用主动学习对健康的社会决定因素进行标注，并使用神经事件提取对决定因素进行特征描述。

J Biomed Inform. 2021 Jan;113:103631. doi: 10.1016/j.jbi.2020.103631. Epub 2020 Dec 5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于大语言模型的健康社会决定因素识别分类器的开发与验证

On the development and validation of large language model-based classifiers for identifying social determinants of health.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献