Suppr超能文献

一种使用真实世界数据预测肯尼亚艾滋病毒病毒载量热点的机器学习方法。

A Machine Learning Approach to Predict HIV Viral Load Hotspots in Kenya Using Real-World Data.

作者信息

Kagendi Nancy, Mwau Matilu

机构信息

Kenya Medical Research Institute, Nairobi, Kenya.

出版信息

Health Data Sci. 2023 Oct 2;3:0019. doi: 10.34133/hds.0019. eCollection 2023.

Abstract

BACKGROUND

Machine learning models are not in routine use for predicting HIV status. Our objective is to describe the development of a machine learning model to predict HIV viral load (VL) hotspots as an early warning system in Kenya, based on routinely collected data by affiliate entities of the Ministry of Health. Based on World Health Organization's recommendations, hotspots are health facilities with ≥20% people living with HIV whose VL is not suppressed. Prediction of VL hotspots provides an early warning system to health administrators to optimize treatment and resources distribution.

METHODS

A random forest model was built to predict the hotspot status of a health facility in the upcoming month, starting from 2016. Prior to model building, the datasets were cleaned and checked for outliers and multicollinearity at the patient level. The patient-level data were aggregated up to the facility level before model building. We analyzed data from 4 million tests and 4,265 facilities. The dataset at the health facility level was divided into train (75%) and test (25%) datasets.

RESULTS

The model discriminates hotspots from non-hotspots with an accuracy of 78%. The F1 score of the model is 69% and the Brier score is 0.139. In December 2019, our model correctly predicted 434 VL hotspots in addition to the observed 446 VL hotspots.

CONCLUSION

The hotspot mapping model can be essential to antiretroviral therapy programs. This model can provide support to decision-makers to identify VL hotspots ahead in time using cost-efficient routinely collected data.

摘要

背景

机器学习模型尚未常规用于预测艾滋病毒感染状况。我们的目标是基于肯尼亚卫生部附属实体定期收集的数据,描述一个机器学习模型的开发过程,该模型用于预测艾滋病毒病毒载量(VL)热点地区,作为一种早期预警系统。根据世界卫生组织的建议,热点地区是指艾滋病毒感染者中病毒载量未得到抑制的人数占比≥20%的医疗机构。预测病毒载量热点地区可为卫生管理人员提供早期预警系统,以优化治疗和资源分配。

方法

构建了一个随机森林模型,用于预测自2016年起未来一个月内医疗机构的热点地区状态。在模型构建之前,对数据集进行了清理,并在患者层面检查了异常值和多重共线性。在模型构建之前,将患者层面的数据汇总到医疗机构层面。我们分析了来自400万次检测和4265家医疗机构的数据。医疗机构层面的数据集被分为训练集(75%)和测试集(25%)。

结果

该模型区分热点地区和非热点地区的准确率为78%。该模型的F1分数为69%,布里尔分数为0.139。在2019年12月,我们的模型除了正确预测了观察到的446个病毒载量热点地区外,还正确预测了434个病毒载量热点地区。

结论

热点地区映射模型对于抗逆转录病毒治疗项目可能至关重要。该模型可为决策者提供支持,以便利用具有成本效益的常规收集数据提前识别病毒载量热点地区。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/12f8/10880164/e5cb6307b94a/hds.0019.fig.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验