Elstohy Rasha, Aneis Nevein, Mounir Ali Eman
Department of Information Systems, Obour Institutes, Al-Sharqia, Al-Sharqia, Egypt.
Department of Information communication Technology, New Cairo Technological University, Cairo, New Cairo, Egypt.
PeerJ Comput Sci. 2024 Nov 11;10:e2430. doi: 10.7717/peerj-cs.2430. eCollection 2024.
Labor force participation of Egyptian women has been a chronic economic problem in Egypt. Despite the improvement in the human capital front, whether on the education or health indicators, female labor force participation remains persistently low. This study proposes a hybrid machine-learning model that integrates principal component analysis (PCA) for feature extraction with various machine learning and time-series models to predict women's employment in times of crisis. Various machine learning (ML) algorithms, such as support vector machine (SVM), neural network, K-nearest neighbor (KNN), linear regression, random forest, and AdaBoost, in addition to popular time series algorithms, including autoregressive integrated moving average (ARIMA) and vector autoregressive (VAR) models, have been applied to an actual dataset from the public sector. The manpower dataset considered gender from different regions, ages, and educational levels. The dataset was then trained, tested, and evaluated. For performance validation, forecasting accuracy metrics were constructed using mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), mean absolute percent error (MAPE), R-squared (R2), and cross-validated root mean squared error (CVRMSE). Another Dickey-Fuller test was performed to evaluate and compare the accuracy of the applied models, and the results showed that AdaBoost outperforms the other methods by an accuracy of 100%. Compared to alternative works, our findings demonstrate a comprehensive comparative analysis for predicting women's participation in different regions during an economic crisis.
埃及女性的劳动力参与率一直是埃及长期存在的经济问题。尽管在人力资本方面有所改善,无论是教育指标还是健康指标,但女性劳动力参与率仍然持续偏低。本研究提出了一种混合机器学习模型,该模型将用于特征提取的主成分分析(PCA)与各种机器学习和时间序列模型相结合,以预测危机时期的女性就业情况。除了包括自回归积分移动平均(ARIMA)和向量自回归(VAR)模型在内的流行时间序列算法外,各种机器学习(ML)算法,如支持向量机(SVM)、神经网络、K近邻(KNN)、线性回归、随机森林和AdaBoost,都已应用于来自公共部门的实际数据集。人力数据集考虑了不同地区、年龄和教育水平的性别因素。然后对该数据集进行训练、测试和评估。为了进行性能验证,使用均方误差(MSE)、均方根误差(RMSE)、平均绝对误差(MAE)、平均绝对百分比误差(MAPE)、决定系数(R2)和交叉验证均方根误差(CVRMSE)构建预测准确性指标。还进行了另一次迪基 - 富勒检验,以评估和比较所应用模型的准确性,结果表明AdaBoost的准确率为100%,优于其他方法。与其他相关研究相比,我们的研究结果展示了在经济危机期间预测不同地区女性参与情况的全面比较分析。