Department of Health Informatics, School of Public Health, College of Medicine and Health Sciences, Wollo University, Dessie, Ethiopia.
Department of Health Informatics, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia.
BMC Infect Dis. 2023 Jan 23;23(1):49. doi: 10.1186/s12879-023-07987-6.
INTRODUCTION: Sexually transmitted infections (STIs) are the major public health problem globally, affecting millions of people every day. The burden is high in the Sub-Saharan region, including Ethiopia. Besides, there is little evidence on the distribution of STIs across Ethiopian regions. Hence, having a better understanding of the infections is of great importance to lessen their burden on society. Therefore, this article aimed to assess predictors of STIs using machine learning techniques and their geographic distribution across Ethiopian regions. Assessing the predictors of STIs and their spatial distribution could help policymakers to understand the problems better and design interventions accordingly. METHODS: A community-based cross-sectional study was conducted from January 18, 2016, to June 27, 2016, using the 2016 Ethiopian Demography and Health Survey (EDHS) dataset. We applied spatial autocorrelation analysis using Global Moran's I statistics to detect latent STI clusters. Spatial scan statics was done to identify local significant clusters based on the Bernoulli model using the SaTScan™ for spatial distribution and Supervised machine learning models such as C5.0 Decision tree, Random Forest, Support Vector Machine, Naïve Bayes, and Logistic regression were applied to the 2016 EDHS dataset for STI prediction and their performances were analyzed. Association rules were done using an unsupervised machine learning algorithm. RESULTS: The spatial distribution of STI in Ethiopia was clustered across the country with a global Moran's index = 0.06 and p value = 0.04. The Random Forest algorithm was best for STI prediction with 69.48% balanced accuracy and 68.50% area under the curve. The random forest model showed that region, wealth, age category, educational level, age at first sex, working status, marital status, media access, alcohol drinking, chat chewing, and sex of the respondent were the top 11 predictors of STI in Ethiopia. CONCLUSION: Applying random forest machine learning algorithm for STI prediction in Ethiopia is the proposed model to identify the predictors of STIs.
简介:性传播感染(STIs)是全球主要的公共卫生问题,每天影响着数百万人。撒哈拉以南地区(包括埃塞俄比亚)的负担很重。此外,关于埃塞俄比亚各地区 STI 分布的证据很少。因此,更好地了解这些感染对于减轻它们对社会的负担非常重要。因此,本文旨在使用机器学习技术评估 STI 的预测因素及其在埃塞俄比亚各地区的地理分布。评估 STI 的预测因素及其空间分布可以帮助政策制定者更好地了解问题,并相应地设计干预措施。
方法:本研究于 2016 年 1 月 18 日至 6 月 27 日进行了一项基于社区的横断面研究,使用了 2016 年埃塞俄比亚人口与健康调查(EDHS)数据集。我们应用空间自相关分析使用全局 Moran's I 统计量来检测潜在的 STI 聚类。基于 Bernoulli 模型的空间扫描统计分析用于识别局部显著聚类,使用 SaTScan™ 进行空间分布,以及监督机器学习模型,如 C5.0 决策树、随机森林、支持向量机、朴素贝叶斯和逻辑回归,应用于 2016 年 EDHS 数据集进行 STI 预测,并分析其性能。使用无监督机器学习算法进行关联规则分析。
结果:埃塞俄比亚 STI 的空间分布呈全国性聚类,全局 Moran's 指数=0.06,p 值=0.04。随机森林算法是 STI 预测的最佳算法,具有 69.48%的平衡准确性和 68.50%的曲线下面积。随机森林模型显示,地区、财富、年龄类别、教育水平、首次性行为年龄、工作状态、婚姻状况、媒体接触、饮酒、咀嚼聊天和受访者的性别是埃塞俄比亚 STI 的前 11 个预测因素。
结论:在埃塞俄比亚应用随机森林机器学习算法进行 STI 预测是识别 STI 预测因素的建议模型。
BMC Public Health. 2021-5-26
Can J Infect Dis Med Microbiol. 2020-12-14
BMC Med Inform Decis Mak. 2019-12-21