Liu Yiyang, Chen Aokun, Cho Hwayoung, Siddiqi Khairul A, Cook Robert L, Prosperi Mattia
Department of Epidemiology, College of Medicine and College of Public Health and Health Professions, University of Florida, 2004 Mowry Road, PO Box 100231, Gainesville, FL, 32610-0231, USA.
Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA.
BMC Public Health. 2025 Jul 2;25(1):2257. doi: 10.1186/s12889-025-23460-2.
Human Immunodeficiency Virus (HIV) pre-exposure prophylaxis (PrEP) prevents HIV transmission but has low uptake among women. Identifying women who could benefit from PrEP remains a challenge. This study developed a women-specific model to predict HIV risk within a year using electronic health record (EHR) data and social determinants of health (SDoH).
We conducted a case-control study using EHR and claims data from a centralized patient repository in the Southeastern United States (OneFlorida+). The dataset was split into 60% training, 30% testing, and 10% calibration. Five-fold cross-validation was applied for hyperparameter tuning. Contextual-level SDoH were linked to EHR/claim data. Various machine learning (ML) methods were tested, and Shapley Additive Explanations (SHAP) values were used to interpret the model.
Our sample included 1,458 women newly diagnosed with HIV and 33,155 controls who had never been diagnosed. The XGBoost model outperformed other ML methods, achieving an area under the curve (AUC) of 89.3%. Sensitivity and specificity ranged from 83% to 82% at the optimal Youden's index cutoff, identifying 20% as high risk, to 42% and 97% at the optimal F1 score cutoff, identifying 5% as high risk. Of the 20 features with the highest SHAP values, 11 were related to SDoH.
The final model, incorporating demographics, clinical features, and SDoH, can predict HIV risk in the next year for women. Several SDoH factors were found to be important predictors. Future work could involve stakeholders in implementing the model into HIV PrEP decision support and exploring causal pathways to guide risk-reduction interventions.
人类免疫缺陷病毒(HIV)暴露前预防(PrEP)可预防HIV传播,但在女性中的接受率较低。识别能从PrEP中获益的女性仍然是一项挑战。本研究开发了一种针对女性的模型,利用电子健康记录(EHR)数据和健康的社会决定因素(SDoH)来预测一年内的HIV风险。
我们使用来自美国东南部一个集中式患者数据库(OneFlorida+)的EHR和理赔数据进行了一项病例对照研究。数据集被分为60%用于训练、30%用于测试和10%用于校准。采用五折交叉验证进行超参数调整。将情境层面的SDoH与EHR/理赔数据相链接。测试了各种机器学习(ML)方法,并使用Shapley加法解释(SHAP)值来解释模型。
我们的样本包括1458名新诊断为HIV的女性和33155名从未被诊断过的对照。XGBoost模型优于其他ML方法,曲线下面积(AUC)达到89.3%。在最优约登指数临界值时,灵敏度和特异性范围从83%到82%,将20%识别为高风险;在最优F1分数临界值时,灵敏度和特异性分别为42%和97%,将5%识别为高风险。在SHAP值最高的20个特征中,有11个与SDoH相关。
纳入人口统计学、临床特征和SDoH的最终模型可以预测女性下一年的HIV风险。发现几个SDoH因素是重要的预测指标。未来的工作可能涉及让利益相关者将该模型应用于HIV PrEP决策支持,并探索因果途径以指导降低风险的干预措施。