School of Postgraduate Studies and Research, Amoud University, Amoud Valley, Borama, Awdal, 25263, Somalia.
Department of Mathematics, LMNO, CNRS-Université de Caen, Campus II, Science 3, 14032, Caen, France.
Sci Rep. 2024 Mar 12;14(1):5956. doi: 10.1038/s41598-024-56466-8.
Extensive research has been conducted on poverty in developing countries using conventional regression analysis, which has limited prediction capability. This study aims to address this gap by applying advanced machine learning (ML) methods to predict poverty in Somalia. Utilizing data from the first-ever 2020 Somalia Demographic and Health Survey (SDHS), a cross-sectional study design is considered. ML methods, including random forest (RF), decision tree (DT), support vector machine (SVM), and logistic regression, are tested and applied using R software version 4.1.2, while conventional methods are analyzed using STATA version 17. Evaluation metrics, such as confusion matrix, accuracy, precision, sensitivity, specificity, recall, F1 score, and area under the receiver operating characteristic (AUROC), are employed to assess the performance of predictive models. The prevalence of poverty in Somalia is notable, with approximately seven out of ten Somalis living in poverty, making it one of the highest rates in the region. Among nomadic pastoralists, agro-pastoralists, and internally displaced persons (IDPs), the poverty average stands at 69%, while urban areas have a lower poverty rate of 60%. The accuracy of prediction ranged between 67.21% and 98.36% for the advanced ML methods, with the RF model demonstrating the best performance. The results reveal geographical region, household size, respondent age group, husband employment status, age of household head, and place of residence as the top six predictors of poverty in Somalia. The findings highlight the potential of ML methods to predict poverty and uncover hidden information that traditional statistical methods cannot detect, with the RF model identified as the best classifier for predicting poverty in Somalia.
针对发展中国家的贫困问题,已经开展了广泛的研究,其中使用了传统的回归分析方法,但这种方法的预测能力有限。本研究旨在通过应用先进的机器学习(ML)方法来解决这一差距,以预测索马里的贫困情况。本研究利用了首次进行的 2020 年索马里人口与健康调查(SDHS)的数据,采用了横断面研究设计。在 R 软件版本 4.1.2 中测试和应用了 ML 方法,包括随机森林(RF)、决策树(DT)、支持向量机(SVM)和逻辑回归,而传统方法则使用 STATA 版本 17 进行分析。采用混淆矩阵、准确性、精度、敏感性、特异性、召回率、F1 分数和接收器操作特征(ROC)曲线下面积(AUROC)等评估指标来评估预测模型的性能。索马里的贫困发生率很高,大约十分之七的索马里人生活贫困,这一比例在该地区是最高的。在游牧牧民、农牧民和国内流离失所者(IDP)中,贫困平均水平为 69%,而城市地区的贫困率较低,为 60%。先进的 ML 方法的预测准确率在 67.21%至 98.36%之间,其中 RF 模型表现最佳。研究结果表明,地理区域、家庭规模、受访者年龄组、丈夫就业状况、户主年龄和居住地是索马里贫困的前六个预测因素。研究结果强调了 ML 方法在预测贫困和揭示传统统计方法无法检测到的隐藏信息方面的潜力,其中 RF 模型被确定为预测索马里贫困的最佳分类器。