Liu Hui-Tian, Hu Da-Wei
College of Transportation Engineering, Chang'an University, Xi'an 710064, China.
Huan Jing Ke Xue. 2024 Jun 8;45(6):3421-3432. doi: 10.13227/j.hjkx.202305234.
Addressing the issue of carbon emissions in the transportation sector, this research constructed various predictive models using multiple machine learning algorithms based on panel data from 30 provinces in China from 2005 to 2019. The study aimed to identify the optimal machine learning algorithm and key factors influencing the carbon emissions of transportation, providing potent references for policymakers and decision-makers to reduce carbon emissions and promote the sustainable development of the transportation sector. Initially, drawing from the concept of the fixed effects model, we included the heterogeneity differences among provinces as an important factor. We further employed a combined method of Pearson's correlation coefficient and Spearman's rank correlation coefficient to screen 18 factors influencing transportation carbon emissions. We then made a preliminary selection of seven common machine learning algorithms and used the screened factors as explanatory variables for model training. The three algorithms with the best performance were further optimized and trained. Subsequently, we utilized the K-fold cross-validation method; plotted learning curves to test the performance of each predictive model; and used MSE, MAE, , and MAPE as evaluation indicators to determine the best predictive model. SHAP values were chosen to calculate the importance of each explanatory variable in the optimal predictive model. The results indicated that the multicollinearity among the seven factors of provincial differences, total consumption of social goods, urban green space area, freight turnover, number of private cars, transportation industry output, and permanent population was weak, and all passed the significance test. They could be used as explanatory variables in the prediction model of transportation carbon emissions. The prediction results of the Random Forest and XGBoost algorithms were both outstanding, with values above 0.97 and errors below 10 %, showing no signs of overfitting or underfitting. Among them, the XGBoost algorithm performed the best, whereas the KNN algorithm performed poorly. The importance ranking of the explanatory variables was as follows:provincial differences > total consumption of social goods > number of private cars > permanent population > freight turnover > urban green space area > transportation industry output. A comprehensive analysis of relevance and importance showed that provincial differences were an indispensable variable in the prediction of transportation carbon emissions. In conclusion, this study provides a new approach to the governance of carbon emissions in the transportation industry, and the results can serve as a reference for policymakers and decision-makers. In future policy design and decision-making, the distinctive factors of each province should not be overlooked. Measures targeted at specific regions need to be formulated to promote the sustainable development of the transportation industry.
针对交通运输部门的碳排放问题,本研究基于2005年至2019年中国30个省份的面板数据,使用多种机器学习算法构建了各种预测模型。该研究旨在确定最优的机器学习算法以及影响交通运输碳排放的关键因素,为政策制定者和决策者减少碳排放、促进交通运输部门的可持续发展提供有力参考。首先,借鉴固定效应模型的概念,我们将省份之间的异质性差异作为一个重要因素纳入。我们进一步采用Pearson相关系数和Spearman秩相关系数相结合的方法,筛选出18个影响交通运输碳排放的因素。然后,我们对七种常见的机器学习算法进行了初步筛选,并将筛选出的因素作为模型训练的解释变量。对性能最佳的三种算法进行了进一步优化和训练。随后,我们采用K折交叉验证方法;绘制学习曲线以测试每个预测模型的性能;并使用均方误差(MSE)、平均绝对误差(MAE)、均方根误差(RMSE)和平均绝对百分比误差(MAPE)作为评估指标来确定最佳预测模型。选择SHAP值来计算最优预测模型中每个解释变量的重要性。结果表明,省份差异、社会消费品零售总额、城市绿地面积、货物周转量、私家车数量、交通运输业产值和常住人口这七个因素之间的多重共线性较弱,且均通过了显著性检验。它们可作为交通运输碳排放预测模型的解释变量。随机森林算法和XGBoost算法的预测结果都很出色,R值均高于0.97,误差低于10%,没有过拟合或欠拟合的迹象。其中,XGBoost算法表现最佳,而KNN算法表现较差。解释变量的重要性排序如下:省份差异>社会消费品零售总额>私家车数量>常住人口>货物周转量>城市绿地面积>交通运输业产值。对相关性和重要性的综合分析表明,省份差异是交通运输碳排放预测中不可或缺的变量。总之,本研究为交通运输行业的碳排放治理提供了一种新方法,研究结果可为政策制定者和决策者提供参考。在未来的政策设计和决策中,不应忽视每个省份的独特因素。需要制定针对特定地区的措施,以促进交通运输行业的可持续发展。