Gabriel Rodney A, Litake Onkar, Simpson Sierra, Burton Brittany N, Waterman Ruth S, Macias Alvaro A
Division of Perioperative Informatics, Department of Anesthesiology, University of California, San Diego, La Jolla, CA 92037.
Department of Biomedical Informatics, University of California, San Diego Health, La Jolla, CA 92037.
Proc Natl Acad Sci U S A. 2024 Sep 24;121(39):e2320716121. doi: 10.1073/pnas.2320716121. Epub 2024 Sep 16.
The assessment of social determinants of health (SDoH) within healthcare systems is crucial for comprehensive patient care and addressing health disparities. Current challenges arise from the limited inclusion of structured SDoH information within electronic health record (EHR) systems, often due to the lack of standardized diagnosis codes. This study delves into the transformative potential of large language models (LLM) to overcome these challenges. LLM-based classifiers-using Bidirectional Encoder Representations from Transformers (BERT) and A Robustly Optimized BERT Pretraining Approach (RoBERTa)-were developed for SDoH concepts, including homelessness, food insecurity, and domestic violence, using synthetic training datasets generated by generative pre-trained transformers combined with authentic clinical notes. Models were then validated on separate datasets: Medical Information Mart for Intensive Care-III and our institutional EHR data. When training the model with a combination of synthetic and authentic notes, validation on our institutional dataset yielded an area under the receiver operating characteristics curve of 0.78 for detecting homelessness, 0.72 for detecting food insecurity, and 0.83 for detecting domestic violence. This study underscores the potential of LLMs in extracting SDoH information from clinical text. Automated detection of SDoH may be instrumental for healthcare providers in identifying at-risk patients, guiding targeted interventions, and contributing to population health initiatives aimed at mitigating disparities.
在医疗保健系统中评估健康的社会决定因素(SDoH)对于全面的患者护理和解决健康差距至关重要。当前的挑战源于电子健康记录(EHR)系统中结构化SDoH信息的纳入有限,这通常是由于缺乏标准化诊断代码所致。本研究深入探讨了大语言模型(LLM)克服这些挑战的变革潜力。基于LLM的分类器——使用来自Transformer的双向编码器表示(BERT)和一种经过稳健优化的BERT预训练方法(RoBERTa)——针对包括无家可归、粮食不安全和家庭暴力在内的SDoH概念而开发,使用了由生成式预训练Transformer生成的合成训练数据集并结合真实临床记录。然后在单独的数据集上对模型进行验证:重症监护医学信息集市-III和我们机构的EHR数据。当使用合成记录和真实记录的组合训练模型时,在我们的机构数据集上进行验证,检测无家可归的受试者工作特征曲线下面积为0.78,检测粮食不安全的为0.72,检测家庭暴力的为0.83。本研究强调了LLM在从临床文本中提取SDoH信息方面的潜力。SDoH的自动检测可能有助于医疗保健提供者识别高危患者、指导有针对性的干预措施,并为旨在减少差距的人群健康倡议做出贡献。