Yehadji Degninou, Gray Geraldine, Vicente Carlos Arias, Isaakidis Petros, Diallo Abdourahimi, Kamano Saa Andre, Diallo Thierno Saidou
Médecins Sans Frontières Belgique, Guinea Mission, Conakry, Guinea.
Technological University Dublin, School of Informatics and Cybersecurity, Dublin, Ireland.
Front Artif Intell. 2025 Mar 19;8:1446876. doi: 10.3389/frai.2025.1446876. eCollection 2025.
Viral load (VL) suppression is key to ending the global HIV epidemic, and predicting it is critical for healthcare providers and people living with HIV (PLHIV). Traditional research has focused on statistical analysis, but machine learning (ML) is gradually influencing HIV clinical care. While ML has been used in various settings, there's a lack of research supporting antiretroviral therapy (ART) programs, especially in resource-limited settings like Guinea. This study aims to identify the most predictive variables of VL suppression and develop ML models for PLHIV in Conakry (Guinea).
Anonymized data from HIV patients in eight Conakry health facilities were pre-processed, including variable recoding, record removal, missing value imputation, grouping small categories, creating dummy variables, and oversampling the smallest target class. Support vector machine (SVM), logistic regression (LR), naïve Bayes (NB), random forest (RF), and four stacked models were developed. Optimal parameters were determined through two cross-validation loops using a grid search approach. Sensitivity, specificity, predictive positive value (PPV), predictive negative value (PNV), -score, and area under the curve (AUC) were computed on unseen data to assess model performance. RF was used to determine the most predictive variables.
RF (94% -score, 82% AUC) and NB (89% -score, 82% AUC) were the most optimal models to detect VL suppression and non-suppression when applied to unseen data. The optimal parameters for RF were 1,000 estimators and no maximum depth (Random state = 40), and it identified Regimen schedule_6-Month, Duration on ART (months), Last ART CD4, Regimen schedule_Regular, and Last Pre-ART CD4 as top predictors for VL suppression.
This study demonstrated the capability to predict VL suppression but has some limitations. The results are dependent on the quality of the data and are specific to the Guinea context and thus, there may be limitations with generalizability. Future studies may be to conduct a similar study in a different context and develop the most optimal model into an application that can be tested in a clinical context.
病毒载量(VL)抑制是终结全球艾滋病流行的关键,对医疗服务提供者和艾滋病病毒感染者(PLHIV)而言,预测病毒载量抑制情况至关重要。传统研究主要集中在统计分析方面,但机器学习(ML)正逐渐影响艾滋病临床护理。虽然机器学习已在各种场景中得到应用,但缺乏支持抗逆转录病毒疗法(ART)项目的研究,尤其是在几内亚这样资源有限的地区。本研究旨在确定病毒载量抑制的最具预测性的变量,并为科纳克里(几内亚)的艾滋病病毒感染者开发机器学习模型。
对来自科纳克里八个医疗机构的艾滋病患者匿名数据进行预处理,包括变量重新编码、记录删除、缺失值插补、小类别分组、创建虚拟变量以及对最小目标类别进行过采样。开发了支持向量机(SVM)、逻辑回归(LR)、朴素贝叶斯(NB)、随机森林(RF)以及四个堆叠模型。通过使用网格搜索方法的两个交叉验证循环确定最佳参数。在未见过的数据上计算敏感性、特异性、预测阳性值(PPV)、预测阴性值(PNV)、F1分数和曲线下面积(AUC),以评估模型性能。使用随机森林确定最具预测性的变量。
当应用于未见过的数据时,随机森林(F1分数为94%,AUC为82%)和朴素贝叶斯(F1分数为89%,AUC为82%)是检测病毒载量抑制和未抑制的最优模型。随机森林的最佳参数为1000个估计器且无最大深度(随机状态 = 40),它将治疗方案时间表_6个月、抗逆转录病毒治疗持续时间(月)、上次抗逆转录病毒治疗时的CD4、治疗方案时间表_常规以及上次抗逆转录病毒治疗前的CD4确定为病毒载量抑制的顶级预测因素。
本研究证明了预测病毒载量抑制的能力,但存在一些局限性。结果取决于数据质量,且特定于几内亚的情况,因此在可推广性方面可能存在局限性。未来的研究可以在不同背景下进行类似研究,并将最优模型开发成可在临床环境中进行测试的应用程序。