Tadese Zinabu Bekele, Hailu Debela Tsegaye, Abebe Aschale Wubete, Kebede Shimels Derso, Walle Agmasie Damtew, Seifu Beminate Lemma, Nimani Teshome Demis
Department of Health Informatics, College of Medicine and Health Science, Samara University, Samara, Ethiopia.
Department of Health Informatics, School of Public Health, Bule Hora University, Bule Hora, Ethiopia.
Digit Health. 2024 Aug 6;10:20552076241272739. doi: 10.1177/20552076241272739. eCollection 2024 Jan-Dec.
Although the prevalence of childhood illnesses has significantly decreased, acute respiratory infections continue to be the leading cause of death and disease among children in low- and middle-income countries. Seven percent of children under five experienced symptoms in the two weeks preceding the Ethiopian demographic and health survey. Hence, this study aimed to identify interpretable predicting factors of acute respiratory infection disease among under-five children in Ethiopia using machine learning analysis techniques.
Secondary data analysis was performed using 2016 Ethiopian demographic and health survey data. Data were extracted using STATA and imported into Jupyter Notebook for further analysis. The presence of acute respiratory infection in a child under the age of 5 was the outcome variable, categorized as yes and no. Five ensemble boosting machine learning algorithms such as adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), Gradient Boost, CatBoost, and light gradient-boosting machine (LightGBM) were employed on a total sample of 10,641 children under the age of 5. The Shapley additive explanations technique was used to identify the important features and effects of each feature driving the prediction.
XGBoost model achieved an accuracy of 79.3%, an F1 score of 78.4%, a recall of 78.3%, a precision of 81.7%, and a receiver operating curve area under the curve of 86.1% after model optimization. Child age (month), history of diarrhea, number of living children, duration of breastfeeding, and mother's occupation were the top predicting factors of acute respiratory infection among children under the age of 5 in Ethiopia.
The XGBoost classifier was the best predictive model with improved performance, and predicting factors of acute respiratory infection were identified with the help of the Shapely additive explanation. The findings of this study can help policymakers and stakeholders understand the decision-making process for acute respiratory infection prevention among under-five children in Ethiopia.
尽管儿童疾病的患病率已显著下降,但急性呼吸道感染仍是低收入和中等收入国家儿童死亡和患病的主要原因。在埃塞俄比亚人口与健康调查前两周内,7%的五岁以下儿童出现了相关症状。因此,本研究旨在运用机器学习分析技术,确定埃塞俄比亚五岁以下儿童急性呼吸道感染疾病的可解释预测因素。
使用2016年埃塞俄比亚人口与健康调查数据进行二次数据分析。数据通过STATA提取,并导入Jupyter Notebook进行进一步分析。五岁以下儿童是否存在急性呼吸道感染为结果变量,分为是和否两类。对10641名五岁以下儿童的总样本采用了五种集成增强机器学习算法,如自适应增强(AdaBoost)、极端梯度增强(XGBoost)、梯度增强、CatBoost和轻量级梯度增强机(LightGBM)。采用夏普利值加法解释技术来识别驱动预测的重要特征以及每个特征的影响。
经过模型优化后,XGBoost模型的准确率达到79.3%,F1分数为78.4%,召回率为78.3%,精确率为81.7%,曲线下面积为86.1%。儿童年龄(月)、腹泻病史、存活子女数量、母乳喂养持续时间和母亲职业是埃塞俄比亚五岁以下儿童急性呼吸道感染的主要预测因素。
XGBoost分类器是性能最佳的预测模型,并借助夏普利值加法解释确定了急性呼吸道感染的预测因素。本研究结果有助于政策制定者和利益相关者了解埃塞俄比亚五岁以下儿童急性呼吸道感染预防的决策过程。