Suppr超能文献

比较个体和集成机器学习模型对未经处理和处理酸性矿山排水中硫酸盐水平的预测。

Comparison of individual and ensemble machine learning models for prediction of sulphate levels in untreated and treated Acid Mine Drainage.

机构信息

Molecular Sciences Institute, School of Chemistry, University of the Witwatersrand, Private Bag X3, Johannesburg, 2050, South Africa.

Pharmacy Department, School of Healthcare Sciences, University of Limpopo, Turfloop Campus, Polokwane, 0727, South Africa.

出版信息

Environ Monit Assess. 2024 Mar 2;196(4):332. doi: 10.1007/s10661-024-12467-8.

Abstract

Machine learning was used to provide data for further evaluation of potential extraction of octathiocane (S), a commercially useful by-product, from Acid Mine Drainage (AMD) by predicting sulphate levels in an AMD water quality dataset. Individual ML regressor models, namely: Linear Regression (LR), Least Absolute Shrinkage and Selection Operator (LASSO), Ridge (RD), Elastic Net (EN), K-Nearest Neighbours (KNN), Support Vector Regression (SVR), Decision Tree (DT), Extreme Gradient Boosting (XGBoost), Random Forest (RF), Multi-Layer Perceptron Artificial Neural Network (MLP) and Stacking Ensemble (SE-ML) combinations of these models were successfully used to predict sulphate levels. A SE-ML regressor trained on untreated AMD which stacked seven of the best-performing individual models and fed them to a LR meta-learner model was found to be the best-performing model with a Mean Squared Error (MSE) of 0.000011, Mean Absolute Error (MAE) of 0.002617 and R of 0.9997. Temperature (°C), Total Dissolved Solids (mg/L) and, importantly, iron (mg/L) were highly correlated to sulphate (mg/L) with iron showing a strong positive linear correlation that indicated dissolved products from pyrite oxidation. Ensemble learning (bagging, boosting and stacking) outperformed individual methods due to their combined predictive accuracies. Surprisingly, when comparing SE-ML that combined all models with SE-ML that combined only the best-performing models, there was only a slight difference in model accuracies which indicated that including bad-performing models in the stack had no adverse effect on its predictive performance.

摘要

机器学习被用于提供数据,以进一步评估从酸性矿山排水(AMD)中潜在提取商业上有用的八硫杂环辛烷(S)的可能性,方法是预测 AMD 水质数据集的硫酸盐水平。使用了单个 ML 回归模型,即:线性回归(LR)、最小绝对值收缩和选择算子(LASSO)、岭回归(RD)、弹性网络(EN)、K-最近邻(KNN)、支持向量回归(SVR)、决策树(DT)、极端梯度提升(XGBoost)、随机森林(RF)、多层感知机人工神经网络(MLP)和这些模型的堆叠集成(SE-ML)组合,成功地用于预测硫酸盐水平。在未处理的 AMD 上训练的 SE-ML 回归器堆叠了七个表现最好的单个模型,并将它们馈送到 LR 元学习器模型中,发现它是表现最好的模型,其均方误差(MSE)为 0.000011,平均绝对误差(MAE)为 0.002617,R 为 0.9997。温度(°C)、总溶解固体(mg/L),以及重要的是铁(mg/L)与硫酸盐(mg/L)高度相关,铁表现出强烈的正线性相关,表明黄铁矿氧化的溶解产物。由于其综合预测准确性,集成学习(袋装、提升和堆叠)优于单个方法。令人惊讶的是,在比较将所有模型组合的 SE-ML 与仅将表现最好的模型组合的 SE-ML 时,模型准确性只有微小差异,这表明在堆叠中包含表现不佳的模型对其预测性能没有不利影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/58bb/10907470/40823326ebde/10661_2024_12467_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验