Suppr超能文献

利用肯尼亚的常规数据,通过机器学习改善艾滋病毒筛查。

Machine learning to improve HIV screening using routine data in Kenya.

作者信息

Friedman Jonathan D, Mwangi Jonathan M, Muthoka Kennedy J, Otieno Benedette A, Odhiambo Jacob O, Miruka Frederick O, Nyagah Lilly M, Mwele Pascal M, Obat Edmon O, Omoro Gonza O, Ndisha Margaret M, Kimanga Davies O

机构信息

Data Science, Palladium Group, Washington, DC, USA.

Division of Global, HIV & TB, US Centers for Disease Control and Prevention, Nairobi, Kenya.

出版信息

J Int AIDS Soc. 2025 Apr;28(4):e26436. doi: 10.1002/jia2.26436.

Abstract

INTRODUCTION

Optimal use of HIV testing resources accelerates progress towards ending HIV as a global threat. In Kenya, current testing practices yield a 2.8% positivity rate for new diagnoses reported through the national HIV electronic medical record (EMR) system. Increasingly, researchers have explored the potential for machine learning to improve the identification of people with undiagnosed HIV for referral for HIV testing. However, few studies have used routinely collected programme data as the basis for implementing a real-time clinical decision support system to improve HIV screening. In this study, we applied machine learning to routine programme data from Kenya's EMR to predict the probability that an individual seeking care is undiagnosed HIV positive and should be prioritized for testing.

METHODS

We combined de-identified individual-level EMR data from 167,509 individuals without a previous HIV diagnosis who were tested between June and November 2022. We included demographics, clinical histories and HIV-relevant behavioural practices with open-source data that describes population-level behavioural practices as other variables in the model. We used multiple imputations to address high rates of missing data, selecting the optimal technique based on out-of-sample error. We generated a stratified 60-20-20 train-validate-test split to assess model generalizability. We trained four machine learning algorithms including logistic regression, Random Forest, AdaBoost and XGBoost. Models were evaluated using Area Under the Precision-Recall Curve (AUCPR), a metric that is well-suited to cases of class imbalance such as this, in which there are far more negative test results than positive.

RESULTS

All model types demonstrated predictive performance on the test set with AUCPR that exceeded the current positivity rate. XGBoost generated the greatest AUCPR, 10.5 times greater than the rate of positive test results.

CONCLUSIONS

Our study demonstrated that machine learning applied to routine HIV testing data may be used as a clinical decision support tool to refer persons for HIV testing. The resulting model could be integrated in the screening form of an EMR and used as a real-time decision support tool to inform testing decisions. Although issues of data quality and missing data remained, these challenges could be addressed using sound data preparation techniques.

摘要

引言

优化利用艾滋病毒检测资源可加速在消除作为全球威胁的艾滋病毒方面取得进展。在肯尼亚,目前的检测做法使通过国家艾滋病毒电子病历(EMR)系统报告的新诊断阳性率达到2.8%。越来越多的研究人员探索了机器学习在改善未确诊艾滋病毒者的识别以便转介进行艾滋病毒检测方面的潜力。然而,很少有研究将常规收集的项目数据用作实施实时临床决策支持系统以改善艾滋病毒筛查的基础。在本研究中,我们将机器学习应用于肯尼亚电子病历的常规项目数据,以预测寻求治疗的个人未确诊艾滋病毒呈阳性且应优先进行检测的概率。

方法

我们合并了2022年6月至11月期间接受检测的167,509名既往无艾滋病毒诊断的个体的去识别化个人层面电子病历数据。我们将人口统计学、临床病史和与艾滋病毒相关的行为习惯与描述人群层面行为习惯的开源数据作为模型中的其他变量纳入。我们使用多重插补法来处理高比例的缺失数据,根据样本外误差选择最佳技术。我们生成了一个分层的60 - 20 - 20训练 - 验证 - 测试分割来评估模型的泛化能力。我们训练了四种机器学习算法,包括逻辑回归、随机森林、AdaBoost和XGBoost。使用精确召回率曲线下面积(AUCPR)对模型进行评估,该指标非常适合此类类别不平衡的情况,即阴性检测结果远多于阳性结果。

结果

所有模型类型在测试集上均表现出预测性能,AUCPR超过了当前的阳性率。XGBoost产生的AUCPR最大,比阳性检测结果率高10.5倍。

结论

我们的研究表明,应用于常规艾滋病毒检测数据的机器学习可用作临床决策支持工具,以转介人员进行艾滋病毒检测。所得模型可整合到电子病历的筛查表单中,并用作实时决策支持工具,为检测决策提供信息。尽管数据质量和缺失数据问题仍然存在,但使用合理的数据准备技术可以解决这些挑战。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f95/12010039/3555d53fcc45/JIA2-28-e26436-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验