Hatef Elham, Kitchen Christopher, Gray Geoffrey M, Zirikly Ayah, Richards Thomas, Ahumada Luis M, Weiner Jonathan P
Division of General Internal Medicine, Department of Medicine, Johns Hopkins School of Medicine, Baltimore, MD 21205, United States.
Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, United States.
JAMIA Open. 2024 Oct 29;7(4):ooae117. doi: 10.1093/jamiaopen/ooae117. eCollection 2024 Dec.
To improve the performance of a social risk score (a predictive risk model) using electronic health record (EHR) structured and unstructured data.
We used EPIC-based EHR data from July 2016 to June 2021 and linked it to community-level data from the US Census American Community Survey. We identified predictors of interest within the EHR structured data and applied natural language processing (NLP) techniques to identify patients' social needs in the EHR unstructured data. We performed logistic regression models with and without information from the unstructured data (Models I and II) and compared their performance with generalized estimating equation (GEE) models with and without the unstructured data (Models III and IV).
The logistic model (Model I) performed well (Area Under the Curve [AUC] 0.703, 95% confidence interval [CI] 0.701:0.705) and the addition of EHR unstructured data (Model II) resulted in a slight change in the AUC (0.701, 95% CI 0.699:0.703). In the logistic models, the addition of EHR unstructured data resulted in an increase in the area under the precision-recall curve (PRC 0.255, 95% CI 0.254:0.256 in Model I versus 0.378, 95% CI 0.375:0.38 in Model II). The GEE models performed similarly to the logistic models and the addition of EHR unstructured data resulted in a slight change in the AUC (0.702, 95% CI 0.699:0.705 in Model III versus 0.699, 95% CI 0.698:0.702 in Model IV).
Our work presents the enhancement of a novel social risk score that integrates community-level data with patient-level data to systematically identify patients at increased risk of having future social needs for in-depth assessment of their social needs and potential referral to community-based organizations to address these needs.
The addition of information on social needs extracted from unstructured EHR resulted in an improved prediction of positive cases presented by the improvement in the PRC.
利用电子健康记录(EHR)的结构化和非结构化数据提高社会风险评分(一种预测风险模型)的性能。
我们使用了2016年7月至2021年6月基于EPIC的EHR数据,并将其与美国人口普查美国社区调查的社区层面数据相链接。我们在EHR结构化数据中确定了感兴趣的预测因素,并应用自然语言处理(NLP)技术在EHR非结构化数据中识别患者的社会需求。我们进行了包含和不包含非结构化数据信息的逻辑回归模型(模型I和模型II),并将它们的性能与包含和不包含非结构化数据的广义估计方程(GEE)模型(模型III和模型IV)进行比较。
逻辑模型(模型I)表现良好(曲线下面积[AUC]为0.703,95%置信区间[CI]为0.701:0.705),添加EHR非结构化数据(模型II)导致AUC略有变化(0.701,95%CI为0.699:0.703)。在逻辑模型中,添加EHR非结构化数据导致精确召回率曲线下面积增加(模型I中PRC为0.255,95%CI为0.254:0.256,模型II中为0.378,95%CI为0.375:0.38)。GEE模型的表现与逻辑模型相似,添加EHR非结构化数据导致AUC略有变化(模型III中为0.702,95%CI为0.699:0.705,模型IV中为0.699,95%CI为0.698:0.702)。
我们的工作展示了一种新型社会风险评分的增强,该评分将社区层面数据与患者层面数据相结合,以系统地识别未来有社会需求风险增加的患者,以便对其社会需求进行深入评估,并可能转介至社区组织以满足这些需求。
从非结构化EHR中提取社会需求信息,通过PRC的改善,对阳性病例的预测得到了改进。