Baziar Mansour, Yousefi Mahmood, Oskoei Vahide, Makhdoomi Ahmad, Abdollahzadeh Reza, Dehghan Aliakbar
Department of Environmental Health Engineering, Ferdows Faculty of Medical Sciences, Birjand University of Medical Sciences, Birjand, Iran.
Department of Environmental Health Engineering, School of Public Health, Khoy University of Medical Sciences, Khoy, Iran.
Sci Rep. 2025 Apr 26;15(1):14589. doi: 10.1038/s41598-025-99432-8.
In this research, our objective was to utilize different machine learning techniques, such as XGBoost, Extra Trees, CatBoost, and Multiple Linear Regression (MLR), to model the heating values of municipal solid waste. The input parameters considered for the constructed models included the weight of the dry sample (kg) and the content of carbon (C), hydrogen (H), oxygen (O), nitrogen (N), sulfur (S), and ash in kg. The Extra Trees model, fine-tuned for hyperparameters, demonstrated outstanding performance, achieving R values of 0.999 in the training set and 0.979 in the testing set. Notably, the model has shown robust accuracy, as evidenced by a low Mean Squared Error (MSE) of 77,455.92 on the testing dataset. Furthermore, the Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) were 245.886 and 16.22%, respectively, further proving the model's substantial predictive accuracy and reliability. Although XGBoost and CatBoost demonstrated strong predictive capabilities with high R values, Extra Trees outperformed them by achieving significantly lower error metrics. On the contrary, MLR, utilized as a conventional technique, demonstrated moderate performance, suggesting a distinct trade-off between explanatory power and predictive accuracy. In the feature importance examination of the optimal model, Extra Trees, nitrogen content emerged as the most impactful factor, succeeded by sulfur content, ash content, and dry sample weight in a descending hierarchy of significance.
在本研究中,我们的目标是利用不同的机器学习技术,如XGBoost、极端随机树、CatBoost和多元线性回归(MLR),来建立城市固体废物热值的模型。构建模型时考虑的输入参数包括干样品的重量(kg)以及碳(C)、氢(H)、氧(O)、氮(N)、硫(S)的含量和灰分(单位为kg)。经过超参数微调的极端随机树模型表现出色,在训练集中的R值达到0.999,在测试集中的R值达到0.979。值得注意的是,该模型显示出稳健的准确性,测试数据集上的均方误差(MSE)低至77455.92即可证明这一点。此外,平均绝对误差(MAE)和平均绝对百分比误差(MAPE)分别为245.886和16.22%,进一步证明了该模型具有较高的预测准确性和可靠性。虽然XGBoost和CatBoost在高R值的情况下显示出强大的预测能力,但极端随机树通过实现显著更低的误差指标而优于它们。相反,作为传统技术的MLR表现中等,这表明在解释力和预测准确性之间存在明显的权衡。在对最优模型(极端随机树)的特征重要性检验中,氮含量成为最具影响力的因素,其次是硫含量、灰分含量和干样品重量,重要性依次递减。