African Center of Excellence in Data Science, University of Rwanda, Kigali, BP 4285, Rwanda.
College of Engineering, Carnegie Mellon University Africa, Kigali, BP 6150, Rwanda.
BMC Med Res Methodol. 2021 Jul 31;21(1):159. doi: 10.1186/s12874-021-01346-2.
AIM: HIV prevention measures in sub-Saharan Africa are still short of attaining the UNAIDS 90-90-90 fast track targets set in 2014. Identifying predictors for HIV status may facilitate targeted screening interventions that improve health care. We aimed at identifying HIV predictors as well as predicting persons at high risk of the infection. METHOD: We applied machine learning approaches for building models using population-based HIV Impact Assessment (PHIA) data for 41,939 male and 45,105 female respondents with 30 and 40 variables respectively from four countries in sub-Saharan countries. We trained and validated the algorithms on 80% of the data and tested on the remaining 20% where we rotated around the left-out country. An algorithm with the best mean f1 score was retained and trained on the most predictive variables. We used the model to identify people living with HIV and individuals with a higher likelihood of contracting the disease. RESULTS: Application of XGBoost algorithm appeared to significantly improve identification of HIV positivity over the other five algorithms by f1 scoring mean of 90% and 92% for males and females respectively. Amongst the eight most predictor features in both sexes were: age, relationship with family head, the highest level of education, highest grade at that school level, work for payment, avoiding pregnancy, age at the first experience of sex, and wealth quintile. Model performance using these variables increased significantly compared to having all the variables included. We identified five males and 19 females individuals that would require testing to find one HIV positive individual. We also predicted that 4·14% of males and 10.81% of females are at high risk of infection. CONCLUSION: Our findings provide a potential use of the XGBoost algorithm with socio-behavioural-driven data at substantially identifying HIV predictors and predicting individuals at high risk of infection for targeted screening.
目的:撒哈拉以南非洲的艾滋病毒预防措施仍未达到 2014 年设定的艾滋病署 90-90-90 快速通道目标。确定艾滋病毒状况的预测因素可能有助于有针对性的筛查干预措施,改善保健。我们旨在确定艾滋病毒预测因素,并预测感染风险高的人。
方法:我们应用机器学习方法,利用来自撒哈拉以南非洲四个国家的人口艾滋病毒影响评估(PHIA)数据,为 41939 名男性和 45105 名女性应答者构建模型,每个国家分别有 30 个和 40 个变量。我们在 80%的数据上训练和验证算法,并在其余 20%的数据上进行测试,其中围绕被遗漏的国家进行轮换。保留具有最佳平均 f1 评分的算法,并在最具预测性的变量上进行训练。我们使用该模型来识别艾滋病毒感染者和更有可能感染该疾病的个体。
结果:XGBoost 算法的应用似乎显著提高了对男性和女性艾滋病毒阳性的识别能力,f1 评分分别为 90%和 92%。在两性中排名前八位的最具预测性特征包括:年龄、与家庭负责人的关系、最高教育水平、该学校最高年级、有报酬的工作、避免怀孕、首次性经历年龄和财富五分位数。与包含所有变量相比,使用这些变量的模型性能显著提高。我们确定了 5 名男性和 19 名女性需要接受检测才能发现一名艾滋病毒阳性个体。我们还预测,4.14%的男性和 10.81%的女性感染风险较高。
结论:我们的研究结果为 XGBoost 算法与社会行为驱动的数据相结合提供了一种潜在的应用,可大大识别艾滋病毒预测因素,并预测感染风险高的个体,以进行有针对性的筛查。
BMC Med Res Methodol. 2021-7-31
BMC Infect Dis. 2023-7-19
Front Public Health. 2024
J Int AIDS Soc. 2025-4
BMC Public Health. 2024-12-18
Digit Health. 2024-11-21
BMC Med Res Methodol. 2019-3-19
Lancet HIV. 2019-2