Birri Makota Rutendo Beauty, Musenge Eustasius
Division of Epidemiology and Biostatistics, School of Public Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.
PLOS Digit Health. 2023 Jun 7;2(6):e0000260. doi: 10.1371/journal.pdig.0000260. eCollection 2023 Jun.
The burden of HIV and related diseases have been areas of great concern pre and post the emergence of COVID-19 in Zimbabwe. Machine learning models have been used to predict the risk of diseases, including HIV accurately. Therefore, this paper aimed to determine common risk factors of HIV positivity in Zimbabwe between the decade 2005 to 2015. The data were from three two staged population five-yearly surveys conducted between 2005 and 2015. The outcome variable was HIV status. The prediction model was fit by adopting 80% of the data for learning/training and 20% for testing/prediction. Resampling was done using the stratified 5-fold cross-validation procedure repeatedly. Feature selection was done using Lasso regression, and the best combination of selected features was determined using Sequential Forward Floating Selection. We compared six algorithms in both sexes based on the F1 score, which is the harmonic mean of precision and recall. The overall HIV prevalence for the combined dataset was 22.5% and 15.3% for females and males, respectively. The best-performing algorithm to identify individuals with a higher likelihood of HIV infection was XGBoost, with a high F1 score of 91.4% for males and 90.1% for females based on the combined surveys. The results from the prediction model identified six common features associated with HIV, with total number of lifetime sexual partners and cohabitation duration being the most influential variables for females and males, respectively. In addition to other risk reduction techniques, machine learning may aid in identifying those who might require Pre-exposure prophylaxis, particularly women who experience intimate partner violence. Furthermore, compared to traditional statistical approaches, machine learning uncovered patterns in predicting HIV infection with comparatively reduced uncertainty and, therefore, crucial for effective decision-making.
在津巴布韦,新冠疫情出现前后,艾滋病毒及相关疾病的负担一直是备受关注的领域。机器学习模型已被用于准确预测包括艾滋病毒在内的疾病风险。因此,本文旨在确定2005年至2015年这十年间津巴布韦艾滋病毒呈阳性的常见风险因素。数据来自2005年至2015年期间进行的三次两阶段五年期人口调查。结果变量为艾滋病毒感染状况。预测模型通过采用80%的数据进行学习/训练和20%的数据进行测试/预测来拟合。使用分层5折交叉验证程序反复进行重采样。使用套索回归进行特征选择,并使用顺序向前浮动选择确定所选特征的最佳组合。我们根据F1分数(精确率和召回率的调和平均值)比较了六种算法在男女两性中的表现。合并数据集的总体艾滋病毒感染率分别为女性22.5%和男性15.3%。根据综合调查,用于识别艾滋病毒感染可能性较高个体的表现最佳的算法是XGBoost,男性的F1分数高达91.4%,女性为90.1%。预测模型的结果确定了与艾滋病毒相关的六个常见特征,终身性伴侣总数和同居持续时间分别是女性和男性最具影响力的变量。除了其他降低风险的技术外,机器学习可能有助于识别那些可能需要暴露前预防的人,特别是遭受亲密伴侣暴力的女性。此外,与传统统计方法相比,机器学习在预测艾滋病毒感染方面发现了模式,不确定性相对降低,因此对于有效决策至关重要。