Majam M, Segal B, Fieggen J, Smith Eli, Hermans L, Singh L, Phatsoane M, Arora L, Lalla-Edward S T
Ezintsha, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.
Phithos Technologies, Johannesburg, South Africa.
Inform Med Unlocked. 2023;37:101192. doi: 10.1016/j.imu.2023.101192.
Digital data collection and the associated mobile health technologies have allowed for the recent exploration of artificial intelligence as a tool for combatting the HIV epidemic. Machine learning has been found to be useful both in HIV risk prediction and as a decision support tool for guiding pre-exposure prophylaxis (PrEP) treatment. This paper reports data from two sequential studies evaluating the viability of using machine learning to predict the susceptibility of adults to HIV infection using responses from a digital survey deployed in a high burden, low-resource setting.
1036 and 593 participants were recruited across two trials. The first trial was a cross-sectional study in one location and the second trial was a cohort study across three trial sites. The data from the studies were merged, partitioned using standard techniques, and then used to train and evaluate multiple different machine learning models and select and evaluate a final model. Variable importance estimates were calculated using the PIMP and SHAP methodologies.
Characteristics associated with HIV were consistent across both studies. Overall, HIV positive patients had a higher median age (34 [IQR: 29-39] vs 26 [IQR 22-33], p < 0.001), and were more likely to be female (155/703 [22%] vs 107/927 [12%], p < 0.001). HIV positive participants also had more commonly gone a year or more since their last HIV test (183/262 [70%] vs 540/1368 [39%], p < 0.001) and were less likely to report consistent condom usage (113/262 [43%] vs 758/1368 [55%], p < 0.001). Patients who reported TB symptoms were more likely to be HIV positive. The trained models had accuracy values (AUROCs) ranging from 78.5% to 82.8%. A boosted tree model performed best with a sensitivity of 84% (95% CI 72-92), specificity of 71% (95% CI 67-76), and a negative predictive value of 95% (95% CI 93-96) in a hold-out dataset. Age, duration since last HIV test, and number of male sexual partners were consistently three of the four most important variables across both variable importance estimates.
This study has highlighted the synergies present between mobile health and machine learning in HIV. It has been demonstrated that a viable ML model can be built using digital survey data from an low-middle income setting with potential utility in directing health resources.
数字数据收集及相关移动健康技术使得近期人们能够探索将人工智能作为抗击艾滋病流行的工具。机器学习已被证明在艾滋病风险预测以及作为指导暴露前预防(PrEP)治疗的决策支持工具方面都很有用。本文报告了两项连续研究的数据,这些研究评估了在高负担、低资源环境中通过数字调查的回复,利用机器学习预测成年人对艾滋病病毒感染易感性的可行性。
两项试验共招募了1036名和593名参与者。第一项试验是在一个地点进行的横断面研究,第二项试验是在三个试验地点进行的队列研究。研究数据进行合并,使用标准技术进行划分,然后用于训练和评估多个不同的机器学习模型,并选择和评估最终模型。使用PIMP和SHAP方法计算变量重要性估计值。
两项研究中与艾滋病相关的特征一致。总体而言,艾滋病病毒阳性患者的年龄中位数较高(34岁[四分位间距:29 - 39岁]对26岁[四分位间距22 - 33岁],p < 0.001),且更可能为女性(155/703[22%]对107/927[12%],p < 0.001)。艾滋病病毒阳性参与者自上次艾滋病病毒检测以来超过一年未检测的情况也更常见(183/262[70%]对540/1368[39%],p < 0.001),且报告始终使用避孕套的可能性较小(113/262[43%]对758/1368[55%],p < 0.001)。报告有结核病症状的患者更可能为艾滋病病毒阳性。训练后的模型准确率值(曲线下面积)在78.5%至82.8%之间。在一个保留数据集中,增强树模型表现最佳,其灵敏度为84%(95%置信区间72 - 92),特异度为71%(95%置信区间67 - 76),阴性预测值为95%(95%置信区间93 - 96)。在两个变量重要性估计中,年龄、自上次艾滋病病毒检测以来的时长以及男性性伴侣数量始终是四个最重要的变量中的三个。
本研究突出了移动健康与机器学习在艾滋病防治方面的协同作用。已证明可以使用来自中低收入环境的数字调查数据构建一个可行的机器学习模型,该模型在指导卫生资源方面具有潜在效用。