Demsash Addisalem Workie, Abebe Rediet, Gezimu Wubishet, Kitil Gemeda Wakgari, Tizazu Michael Amera, Lembebo Abera, Bekele Firomsa, Alemu Solomon Seyife, Jarso Mohammedamin Hajure, Dube Geleta, Wedajo Lema Fikadu, Purohit Sanju, Kalayou Mulugeta Hayelom
Debre Berhan University, Asrat Woldeyes Health Science Campus, Public Health Department, Debre Berihan, Ethiopia.
Mattu University, Health Science College, Mettu, Ethiopia.
BMC Infect Dis. 2025 May 2;25(1):647. doi: 10.1186/s12879-025-10916-4.
Pneumonia is the leading cause of child morbidity and mortality and accounts for 5.6 million under-five child deaths. Pneumonia has a significant impact on the quality of life, the country's economy, and the survival of children. Therefore, this study aimed to develop data-driven predictive model using machine learning algorithms to predict pneumonia and stratify the determinant factors among children aged 6-23 months in Ethiopia.
A total of 2035 samples of children were used from the 2016 Ethiopian Demographic and Health Survey dataset. Jupyter Notebook from Anaconda Navigators was used for data management and analysis. Important libraries such as Pandas, Seaborn, and Numpy were imported from Python. The data was pre-processed into a training and testing dataset with a 4:1 ratio, and tenfold cross-validation was used to reduce bias and enhance the models' performance. Six machine learning algorithms were used for model building and comparison, and confusion matrix elements were used to evaluate the performance of each algorithm. Principal component analysis and heatmap function were used for correlation detection between features. Feature importance score was used to identify and stratify the most important predictors of pneumonia.
From 2035 total samples, 16.6%, 20.1%, and 24.2% of children had short rapid breath, fever, and cough respectively. The overall magnitude of pneumonia among children aged 6-23 months was 31.3% based on the 2016 EDHS report. A random forest algorithm is the relatively best performance model to predict pneumonia and stratify its determinates with 91.3% accuracy. The health facility visits, child sex, initiation of breastfeeding, birth interval, birth weight, husbands' education, women's age, and region, are the top eight important predictors of pneumonia among children with important scores of more than 5% to 20% respectively.
Random forest is the best model to predict pneumonia and stratify its determinant factors. The implications of this study are profound for advanced research methodology, tailored to promote effective health interventions such as lifestyle modification and behavioral intervention, based on individuals' unique features, specifically for stakeholders to take proactive childcare interventions. The study would serve as pioneering evidence for future research, and researchers are recommended to use deep learning algorithms to enhance prediction accuracy.
肺炎是儿童发病和死亡的主要原因,占560万五岁以下儿童死亡病例。肺炎对生活质量、国家经济和儿童生存有着重大影响。因此,本研究旨在使用机器学习算法开发数据驱动的预测模型,以预测埃塞俄比亚6至23个月大儿童的肺炎情况并对决定因素进行分层。
从2016年埃塞俄比亚人口与健康调查数据集选取了总共2035个儿童样本。使用Anaconda Navigators中的Jupyter Notebook进行数据管理和分析。从Python导入了诸如Pandas、Seaborn和Numpy等重要库。数据以4:1的比例预处理为训练和测试数据集,并使用十折交叉验证来减少偏差并提高模型性能。使用六种机器学习算法进行模型构建和比较,并使用混淆矩阵元素来评估每种算法的性能。主成分分析和热图函数用于检测特征之间的相关性。特征重要性得分用于识别和分层肺炎最重要的预测因素。
在总共2035个样本中,分别有16.6%、20.1%和24.2%的儿童出现呼吸急促、发热和咳嗽症状。根据2016年埃塞俄比亚人口与健康调查(EDHS)报告,6至23个月大儿童的肺炎总体发病率为31.3%。随机森林算法是预测肺炎及其决定因素的相对最佳性能模型,准确率达91.3%。医疗机构就诊、儿童性别、母乳喂养开始情况、生育间隔、出生体重、丈夫教育程度、妇女年龄和地区,是肺炎最重要的八个预测因素,重要性得分分别超过5%至20%。
随机森林是预测肺炎及其决定因素的最佳模型。本研究对于先进研究方法具有深远意义,这些方法旨在根据个体独特特征促进有效的健康干预措施,如生活方式改变和行为干预,特别是为利益相关者采取积极的儿童保育干预措施提供依据。该研究将为未来研究提供开创性证据,建议研究人员使用深度学习算法提高预测准确性。