Yaseen Zaher Mundher, Alhalimi Farah Loui
Civil and Environmental Engineering Department, King Fahd University of Petroleum & Minerals, Dhahran, 31261, Saudi Arabia.
Sci Rep. 2025 Apr 18;15(1):13434. doi: 10.1038/s41598-025-96271-5.
The contamination of water and soils with heavy metals poses a significant environmental threat, making the development of effective removal strategies a global priority. Hence, the determination of heavy metals can play an essential role in environmental monitoring and assessment. In the current research, ensemble machine learning (ML) models (i.e., Random Forest Regressor (RFR), Adaptive Boosting (Adaboost), Gradient Boosting (GB), HistGradientBoosting, Extreme Gradient Boosting (XGBoost), and Light Gradient-Boosting Machine (LightGBM)) were applied in attempt to predict the adsorption efficiency of several heavy metals (i.e., Pb, Cd, Ni, Cu, and Zn) according to different factors including temperature, pH, and biochar characteristics. Data were collected from open-source literature review including 353 samples. At the first stage, data processing was performed including outliers' removal and scaling for better data modeling applicability; whereas, in the second stage the predictive models were conducted. The results showed that XGBoost model attained the superior accuracy in comparison with other models by achieving the highest determination coefficient (R = 0.92). The research was extended to investigate the feature importance analysis which indicated that the initial concentration ratio of metals to biochar and pH were the most influential factors toward the adsorption efficiency followed by Pyrolysis temperature, while other features like physical properties as surface area and pore structure had a minimal effect on efficiency. These findings highlighted the importance of using ensemble ML models in guiding heavy metals removal solutions as it provides an efficient prediction and ease the selection of the environmental application.
水和土壤中的重金属污染对环境构成了重大威胁,因此制定有效的去除策略成为全球优先事项。因此,重金属的测定在环境监测和评估中可以发挥重要作用。在当前的研究中,应用了集成机器学习(ML)模型(即随机森林回归器(RFR)、自适应提升(Adaboost)、梯度提升(GB)、直方图梯度提升、极端梯度提升(XGBoost)和轻量级梯度提升机(LightGBM)),试图根据温度、pH值和生物炭特性等不同因素预测几种重金属(即铅、镉、镍、铜和锌)的吸附效率。数据来自开源文献综述,共收集了353个样本。在第一阶段,进行了数据处理,包括去除异常值和缩放,以提高数据建模的适用性;而在第二阶段,进行了预测模型。结果表明,XGBoost模型与其他模型相比具有更高的准确性,其决定系数最高(R = 0.92)。该研究进一步扩展到特征重要性分析,结果表明,金属与生物炭的初始浓度比和pH值是影响吸附效率的最主要因素,其次是热解温度,而其他特征如表面积和孔隙结构等物理性质对效率的影响最小。这些发现突出了使用集成ML模型指导重金属去除解决方案的重要性,因为它提供了有效的预测并简化了环境应用的选择。