Yang Xueying, Cai Ruilie, Ma Yunqing, Zhang Hao H, Sun XiaoWen, Olatosi Bankole, Weissman Sharon, Li Xiaoming, Zhang Jiajia
South Carolina SmartState Center for Healthcare Quality, Arnold School of Public Health, University of South Carolina, Columbia, SC.
Department of Health Promotion, Education and Behavior, Arnold School of Public Health, University of South Carolina, Columbia, SC.
J Acquir Immune Defic Syndr. 2025 Mar 1;98(3):209-216. doi: 10.1097/QAI.0000000000003561.
This study aims to develop and examine the performance of machine learning (ML) algorithms in predicting viral suppression among statewide people living with HIV (PWH) in South Carolina.
Extracted through the electronic reporting system in South Carolina, the study population was adult PWH who were diagnosed between 2005 and 2021. Viral suppression was defined as viral load <200 copies/mL. The predictors, including sociodemographics, a historical information of viral load indicators (eg, viral rebound), comorbidities, health care utilization, and annual county-level factors (eg, social vulnerability), were measured in each 4-month windows. Using historic information in different lag time windows (1-, 3- or 5-lagged time windows with each 4-month window as a unit), both traditional and ML approaches (eg, Long Short-Term Memory Network) were applied to predict viral suppression. Comparisons of prediction performance between different models were assessed by area under curve (AUC), recall, precision, F1 score, and Youden index.
ML approaches outperformed the generalized linear mixed model. In all the 3 lagged analysis of a total of 15,580 PWH, the Long Short-Term Memory Network (Lag 1: AUC = 0.858; Lag 3: AUC = 0.877; Lag 5: AUC = 0.881) algorithm outperformed all the other methods in terms of AUC performance for predicting viral suppression. The top-ranking predictors that were common in different models included historical information of viral suppression, viral rebound, and viral blips in the Lag-1 time window. Inclusion of county-level variables did not improve the model prediction accuracy.
Supervised ML algorithms may offer better performance for risk prediction of viral suppression than traditional statistical methods.
本研究旨在开发并检验机器学习(ML)算法在预测南卡罗来纳州全州范围内艾滋病毒感染者(PWH)病毒抑制情况方面的性能。
通过南卡罗来纳州的电子报告系统提取研究人群,为2005年至2021年期间被诊断的成年PWH。病毒抑制定义为病毒载量<200拷贝/毫升。预测因素包括社会人口统计学、病毒载量指标的历史信息(如病毒反弹)、合并症、医疗保健利用情况以及年度县级因素(如社会脆弱性),每4个月窗口进行测量。利用不同滞后时间窗口(以每4个月窗口为单位的1、3或5滞后时间窗口)中的历史信息,应用传统和ML方法(如长短期记忆网络)来预测病毒抑制情况。通过曲线下面积(AUC)、召回率、精确率、F1分数和尤登指数评估不同模型之间预测性能的比较。
ML方法优于广义线性混合模型。在总共15580名PWH的所有3次滞后分析中,长短期记忆网络(滞后1:AUC = 0.858;滞后3:AUC = 0.877;滞后5:AUC = 0.881)算法在预测病毒抑制的AUC性能方面优于所有其他方法。不同模型中常见的排名靠前的预测因素包括病毒抑制的历史信息、病毒反弹以及滞后1时间窗口中的病毒波动。纳入县级变量并未提高模型预测准确性。
与传统统计方法相比,监督式ML算法在病毒抑制风险预测方面可能具有更好的性能。