Satapathy Sandeep Kumar, Saravanan Shreyaa, Mishra Shruti, Mohanty Sachi Nandan
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Vandalur-Kelambakkam Road, Chennai, Tamil Nadu 600127 India.
School of Computer Science and Engineering (SCOPE), VIT-AP University, Amaravati, Andhra Pradesh India.
New Gener Comput. 2023;41(1):155-184. doi: 10.1007/s00354-023-00203-8. Epub 2023 Feb 1.
Poverty is a glaring issue in the twenty-first century, even after concerted efforts of organizations to eliminate the same. Predicting poverty using machine learning can offer practical models for facilitating the process of elimination of poverty. This paper uses Multidimensional Poverty Index Data from the Oxford Poverty and Human Development Initiative across the years 2019 and 2021 to make predictions of multidimensional poverty before and during the pandemic. Several poverty indicators under health, education and living standards are taken into consideration. The work implements several data analysis techniques like feature correlation and selection, and graphical visualizations to answer research questions about poverty. Various machine learning, such as Multiple Linear Regression, Decision Tree Regressor, Random Forest Regressor, XGBoost, AdaBoost, Gradient Boosting, Linear Support Vector Regressor (SVR), Ridge Regression, Lasso Regression, ElasticNet Regression, and K-Nearest Neighbor Regression algorithm, have been implemented to predict poverty across four datasets on a national and a subnational level. Regularization is used to increase the performance of the models, and cross-validation is used for estimation. Through a rigorous analysis and comparison of different models, this work identifies important poverty determinants and concludes that overall, Ridge Regression model performs the best with the highest score.
贫困是21世纪一个突出的问题,即便各组织为消除贫困做出了共同努力。利用机器学习预测贫困可为促进消除贫困进程提供实用模型。本文使用牛津贫困与人类发展倡议组织2019年至2021年期间的多维贫困指数数据,对疫情之前和期间的多维贫困情况进行预测。研究考虑了健康、教育和生活水平方面的若干贫困指标。这项工作运用了多种数据分析技术,如特征相关性和选择以及图形可视化,以回答有关贫困的研究问题。为了在国家和次国家层面上对四个数据集的贫困情况进行预测,实施了多种机器学习算法,如多元线性回归、决策树回归器、随机森林回归器、XGBoost、AdaBoost、梯度提升、线性支持向量回归器(SVR)、岭回归、套索回归、弹性网络回归和K近邻回归算法。使用正则化来提高模型性能,并使用交叉验证进行估计。通过对不同模型进行严格分析和比较,这项工作确定了重要的贫困决定因素,并得出结论,总体而言,岭回归模型得分最高,表现最佳。