Departments of Population Health and Medicine, Grossman School of Medicine, New York University, New York City, New York, USA.
Center for Data Science, New York University, New York City, New York, USA.
Health Serv Res. 2023 Dec;58(6):1292-1302. doi: 10.1111/1475-6773.14210. Epub 2023 Aug 3.
To develop a natural language processing (NLP) algorithm that identifies social determinants of health (SDoH), including housing, transportation, food, and medication insecurities, social isolation, abuse, neglect, or exploitation, and financial difficulties for patients with Alzheimer's disease and related dementias (ADRD) from unstructured electronic health records (EHRs).
We leveraged 1000 medical notes randomly selected from 7401 emergency department and inpatient social worker notes generated between 2015 and 2019 for 231 unique patients diagnosed with ADRD at Michigan Medicine.
We developed a rule-based NLP algorithm for the identification of seven domains of SDoH noted above. We also compared the rule-based algorithm with deep learning and regularized logistic regression approaches. These models were compared using accuracy, sensitivity, specificity, F1 score, and the area under the receiver operating characteristic curve (AUC). All notes were split into 700 notes for training NLP algorithms, and 300 notes for validation.
DATA COLLECTION/EXTRACTION METHODS: Social worker notes used in this study were extracted from the Michigan Medicine EHR database.
Of the 700 notes for training, F1 and AUC for the rule-based algorithm were at least 0.94 and 0.95, respectively, for all SDoH categories. Of the 300 notes for validation, F1 and AUC were at least 0.80 and 0.97, respectively, for all SDoH except housing and medication insecurities. The deep learning and regularized logistic regression algorithms had unsatisfactory performance.
The rule-based algorithm can accurately extract SDoH information in all seven domains of SDoH except housing and medication insecurities. Findings from the algorithm can be used by clinicians and social workers to proactively address social needs of patients with ADRD and other vulnerable patient populations.
开发一种自然语言处理(NLP)算法,以从非结构化电子健康记录(EHR)中识别出患有阿尔茨海默病及相关痴呆症(ADRD)的患者的健康社会决定因素(SDoH),包括住房、交通、食物和药物不安全、社会孤立、虐待、忽视或剥削以及经济困难。
我们利用了密歇根大学医学中心在 2015 年至 2019 年间生成的 7401 份急诊和住院社工记录中随机抽取的 1000 份医疗记录,这些记录来自 231 位确诊为 ADRD 的患者。
我们为上述七个 SDoH 领域的识别开发了一种基于规则的 NLP 算法。我们还将基于规则的算法与深度学习和正则逻辑回归方法进行了比较。使用准确性、敏感性、特异性、F1 评分和接收器工作特征曲线(ROC)下的面积(AUC)来比较这些模型。所有记录都被分为 700 份用于训练 NLP 算法,300 份用于验证。
数据收集/提取方法:本研究中使用的社工记录从密歇根大学医学 EHR 数据库中提取。
在 700 份用于训练的记录中,基于规则的算法对于所有 SDoH 类别的 F1 和 AUC 分别至少为 0.94 和 0.95。在 300 份用于验证的记录中,除住房和药物不安全外,F1 和 AUC 分别至少为 0.80 和 0.97。深度学习和正则逻辑回归算法的性能并不理想。
基于规则的算法可以准确地提取除住房和药物不安全外的所有七个 SDoH 领域的 SDoH 信息。该算法的结果可被临床医生和社工用于主动解决 ADRD 患者和其他弱势患者群体的社会需求。