Department of Statistics, College of Science, Bahir Dar University, Bahir Dar, Ethiopia.
School of Mathematics, Statistics and Computer Science, College of Agriculture Engineering and Science, University of KwaZulu-Natal, Durban, South Africa.
BMC Med Inform Decis Mak. 2021 Oct 24;21(1):291. doi: 10.1186/s12911-021-01652-1.
Undernutrition is the main cause of child death in developing countries. This paper aimed to explore the efficacy of machine learning (ML) approaches in predicting under-five undernutrition in Ethiopian administrative zones and to identify the most important predictors.
The study employed ML techniques using retrospective cross-sectional survey data from Ethiopia, a national-representative data collected in the year (2000, 2005, 2011, and 2016). We explored six commonly used ML algorithms; Logistic regression, Least Absolute Shrinkage and Selection Operator (L-1 regularization logistic regression), L-2 regularization (Ridge), Elastic net, neural network, and random forest (RF). Sensitivity, specificity, accuracy, and area under the curve were used to evaluate the performance of those models.
Based on different performance evaluations, the RF algorithm was selected as the best ML model. In the order of importance; urban-rural settlement, literacy rate of parents, and place of residence were the major determinants of disparities of nutritional status for under-five children among Ethiopian administrative zones.
Our results showed that the considered machine learning classification algorithms can effectively predict the under-five undernutrition status in Ethiopian administrative zones. Persistent under-five undernutrition status was found in the northern part of Ethiopia. The identification of such high-risk zones could provide useful information to decision-makers trying to reduce child undernutrition.
发展中国家儿童死亡的主要原因是营养不良。本文旨在探讨机器学习 (ML) 方法在预测埃塞俄比亚行政区五岁以下儿童营养不良方面的功效,并确定最重要的预测因素。
本研究使用了来自埃塞俄比亚的回顾性横断面调查数据,这些数据是在(2000 年、2005 年、2011 年和 2016 年)全国代表性数据中收集的。我们探索了六种常用的 ML 算法;逻辑回归、最小绝对值收缩和选择算子 (L-1 正则化逻辑回归)、L-2 正则化 (Ridge)、弹性网络、神经网络和随机森林 (RF)。我们使用敏感性、特异性、准确性和曲线下面积来评估这些模型的性能。
基于不同的性能评估,随机森林算法被选为最佳 ML 模型。按照重要性顺序排列;城乡居住、父母的文化程度和居住地点是埃塞俄比亚行政区五岁以下儿童营养状况差异的主要决定因素。
我们的结果表明,所考虑的机器学习分类算法可以有效地预测埃塞俄比亚行政区五岁以下儿童的营养不良状况。埃塞俄比亚北部地区一直存在五岁以下儿童营养不良的问题。确定这些高风险地区可以为决策者提供有用的信息,以帮助他们努力减少儿童营养不良。