Department of Health Informatics, School of Public Health, College of Medicine and Health Sciences, Wollo University, Dessie, Ethiopia.
Department of Medical Informatics, Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany.
BMC Infect Dis. 2024 Mar 21;24(1):338. doi: 10.1186/s12879-024-09195-2.
A dearth of studies showed that infectious diseases cause the majority of deaths among under-five children. Worldwide, Acute Respiratory Infection (ARI) continues to be the second most frequent cause of illness and mortality among children under the age of five. The paramount disease burden in developing nations, including Ethiopia, is still ARI.
This study aims to determine the magnitude and predictors of ARI among under-five children in Ethiopia using used state of the art machine learning algorithms.
Data for this study were derived from the 2016 Ethiopian Demographic and Health Survey. To predict the determinants of acute respiratory infections, we performed several experiments on ten machine learning algorithms (random forests, decision trees, support vector machines, Naïve Bayes, and K-nearest neighbors, Lasso regression, GBoost, XGboost), including one classic logistic regression model and an ensemble of the best performing models. The prediction ability of each machine-learning model was assessed using receiver operating characteristic curves, precision-recall curves, and classification metrics.
The total ARI prevalence rate among 9501 under-five children in Ethiopia was 7.2%, according to the findings of the study. The overall performance of the ensemble model of SVM, GBoost, and XGBoost showed an improved performance in classifying ARI cases with an accuracy of 86%, a sensitivity of 84.6%, and an AUC-ROC of 0.87. The highest performing predictive model (the ensemble model) showed that the child's age, history of diarrhea, wealth index, type of toilet, mother's educational level, number of living children, mother's occupation, and type of fuel they used were an important predicting factor for acute respiratory infection among under-five children.
The intricate web of factors contributing to ARI among under-five children was identified using an advanced machine learning algorithm. The child's age, history of diarrhea, wealth index, and type of toilet were among the top factors identified using the ensemble model that registered a performance of 86% accuracy. This study stands as a testament to the potential of advanced data-driven methodologies in unraveling the complexities of ARI in low-income settings.
多项研究表明,传染病是导致 5 岁以下儿童死亡的主要原因。在全球范围内,急性呼吸道感染(ARI)仍然是 5 岁以下儿童患病和死亡的第二大最常见原因。在包括埃塞俄比亚在内的发展中国家,首要疾病负担仍然是 ARI。
本研究旨在使用最先进的机器学习算法来确定埃塞俄比亚 5 岁以下儿童急性呼吸道感染的严重程度和预测因素。
本研究的数据来自 2016 年埃塞俄比亚人口与健康调查。为了预测急性呼吸道感染的决定因素,我们在十种机器学习算法(随机森林、决策树、支持向量机、朴素贝叶斯和 K-最近邻、套索回归、GBoost、XGboost)上进行了多项实验,包括一个经典的逻辑回归模型和一个最佳性能模型的集合。使用接收器工作特征曲线、精度-召回曲线和分类度量来评估每个机器学习模型的预测能力。
根据研究结果,埃塞俄比亚 9501 名 5 岁以下儿童的总 ARI 患病率为 7.2%。SVM、GBoost 和 XGBoost 集成模型的整体性能显示,在分类 ARI 病例方面,其性能有所提高,准确率为 86%,灵敏度为 84.6%,AUC-ROC 为 0.87。表现最好的预测模型(集成模型)表明,儿童年龄、腹泻史、财富指数、厕所类型、母亲教育水平、儿童人数、母亲职业和使用的燃料类型是导致 5 岁以下儿童急性呼吸道感染的重要预测因素。
使用先进的机器学习算法确定了导致 5 岁以下儿童 ARI 的复杂因素网络。在使用集成模型确定的最重要因素中,儿童年龄、腹泻史、财富指数和厕所类型的准确率为 86%。本研究证明了先进的数据驱动方法在揭示低收入环境中 ARI 复杂性方面的潜力。