Ridgway Jessica P, Mason Joseph A, Friedman Eleanor E, Oliwa Tomasz, Flores John, Simon Jodi, Ekong Abbey, Yang Ta-Yun, Schneider John A
Department of Medicine, University of Chicago, Chicago, IL 60637, United States.
Center for Research Informatics, University of Chicago, Chicago, IL 60637, United States.
JAMIA Open. 2025 Jul 24;8(4):ooaf077. doi: 10.1093/jamiaopen/ooaf077. eCollection 2025 Aug.
To compare different machine learning models of loss to follow-up among people with HIV (PWH).
Using electronic medical record (EMR) data from 7340 PWH at a federally qualified health center, we developed machine learning models to predict loss to follow-up in HIV care. Unstructured text from clinical notes was analyzed using Bag of Words and Word Embedding natural language processing (NLP) approaches.
A random forest model utilizing structured data and Bag of Words (area under the receiver operating curve [AUC], 0.787; 95% CI, 0.776-0.798) outperformed a random forest model utilizing structured data alone (AUC, 0.753; 95% CI, 0.741-0.765), as well as a random forest model using Bag of Words alone (AUC, 0.624; 95% CI, 0.610-0.638).
A model using both structured EMR data as well as NLP of unstructured clinical notes had higher performance than models using structured EMR data alone or NLP alone in predicting loss to follow-up from HIV care.
比较感染人类免疫缺陷病毒(HIV)者(PWH)失访的不同机器学习模型。
利用一家联邦合格健康中心7340名PWH的电子病历(EMR)数据,我们开发了机器学习模型来预测HIV治疗中的失访情况。使用词袋法和词嵌入自然语言处理(NLP)方法分析临床记录中的非结构化文本。
利用结构化数据和词袋法的随机森林模型(受试者操作特征曲线下面积[AUC],0.787;95%可信区间[CI],0.776 - 0.798)优于仅利用结构化数据的随机森林模型(AUC,0.753;95% CI,0.741 - 0.765),以及仅使用词袋法的随机森林模型(AUC,0.624;95% CI,0.610 - 0.638)。
在预测HIV治疗失访方面,使用结构化EMR数据以及非结构化临床记录的NLP的模型比仅使用结构化EMR数据或仅使用NLP的模型具有更高的性能。