通过自然语言处理揭示健康的社会决定因素对不良妊娠结局的影响。
Unveiling social determinants of health impact on adverse pregnancy outcomes through natural language processing.
作者信息
Soley Nidhi, Bentil MaKhaila, Shah Jash, Rouhizadeh Masoud, Taylor Casey Overby
机构信息
Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
Institute for Computational Medicine, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA.
出版信息
Sci Rep. 2025 Aug 9;15(1):29183. doi: 10.1038/s41598-025-13542-x.
Understanding the role of Social Determinants of Health (SDoH) in pregnancy outcomes is critical for improving maternal and infant health yet extracting SDoH from unstructured electronic health records remains challenging. We trained and evaluated natural language processing (NLP) models for SDoH extraction from clinical notes in the MIMIC-III database (86 notes), and externally evaluated them on the MIMIC-IV database (171 notes) to assess generalizability. Focusing on social support, occupation, and substance use, we compared rule-based, word embedding, and contextual language models. The ClinicalBERT model with decision tree classifier achieved the highest performance for social support (F1: 0.92), while keyword processing excelled for occupation (F1: 0.74), and word embeddings with random forest performed best for substance use (F1: 0.83). Logistic regression revealed significant associations between pregnancy complications and both substance use (OR 6.47, p < 0.001) and social support (OR 0.07, p < 0.001). Our study demonstrates the feasibility of NLP for SDoH extraction and underscores their clinical relevance in maternal health.
了解健康的社会决定因素(SDoH)在妊娠结局中的作用对于改善母婴健康至关重要,但从非结构化电子健康记录中提取SDoH仍然具有挑战性。我们训练并评估了用于从MIMIC-III数据库中的临床记录(86条记录)提取SDoH的自然语言处理(NLP)模型,并在MIMIC-IV数据库(171条记录)上对其进行外部评估以评估其通用性。聚焦于社会支持、职业和物质使用,我们比较了基于规则的模型、词嵌入模型和上下文语言模型。带有决策树分类器的ClinicalBERT模型在社会支持方面表现最佳(F1:0.92),而关键词处理在职业方面表现出色(F1:0.74),带有随机森林算法的词嵌入模型在物质使用方面表现最佳(F1:0.83)。逻辑回归显示妊娠并发症与物质使用(OR 6.47,p < 0.001)和社会支持(OR 0.07,p < 0.001)之间存在显著关联。我们的研究证明了NLP用于SDoH提取的可行性,并强调了它们在孕产妇健康中的临床相关性。