Suppr超能文献

利用深度学习和回归模型准确预测土耳其东安纳托利亚地区比特利斯省老城的地下水氟污染

Integrating deep learning and regression models for accurate prediction of groundwater fluoride contamination in old city in Bitlis province, Eastern Anatolia Region, Türkiye.

机构信息

Medical Services and Techniques Department, Bitlis Eren University, 13000, Bitlis, Türkiye.

Department of Computer Engineering, Harran University, 63050, Şanlıurfa, Türkiye.

出版信息

Environ Sci Pollut Res Int. 2024 Jul;31(34):47201-47219. doi: 10.1007/s11356-024-34194-w. Epub 2024 Jul 11.

Abstract

Groundwater resources in Bitlis province and its surroundings in Türkiye's Eastern Anatolia Region are pivotal for drinking water, yet they face a significant threat from fluoride contamination, compounded by the region's volcanic rock structure. To address this concern, fluoride levels were meticulously measured at 30 points in June 2019 dry period and September 2019 rainy period. Despite the accuracy of present measurement techniques, their time-consuming nature renders them economically unviable. Therefore, this study aims to assess the distribution of probable geogenic contamination of groundwater and develop a robust prediction model by analyzing the relationship between predictive variables and target contaminants. In this pursuit, various machine learning techniques and regression models, including Linear Regression, Random Forest, Decision Tree, K-Neighbors, and XGBoost, as well as deep learning models such as ANN, DNN, CNN, and LSTM, were employed. Elements such as aluminum (Al), boron (B), cadmium (Cd), cobalt (Co), chromium (Cr), copper (Cu), iron (Fe), manganese (Mn), nickel (Ni), phosphorus (Pb), lead (Pb), and zinc (Zn) were utilized as features to predict fluoride levels. The SelectKbest feature selection method was used to improve the accuracy of the prediction model. This method identifies important features in the dataset for different values of k and increases model efficiency. The models were able to produce more accurate predictions by selecting the most important variables. The findings highlight the superior performance of the XGBoost regressor and CNN in predicting groundwater quality, with XGBoost consistently outperforming other models, exhibiting the lowest values for evaluation metrics like mean squared error (MSE), mean absolute error (MAE), and root mean squared error (RMSE) across different k values. For instance, when considering all features, XGBoost attained an MSE of 0.07, an MAE of 0.22, an RMSE of 0.27, a MAPE of 9.25%, and an NSE of 0.75. Conversely, the Decision Tree regressor consistently displayed inferior performance, with its maximum MSE reaching 0.11 (k = 5) and maximum RMSE of 0.33 (k = 5). Furthermore, feature selection analysis revealed the consistent significance of boron (B) and cadmium (Cd) across all datasets, underscoring their pivotal roles in groundwater contamination. Notably, in the machine learning framework evaluation, the XGBoost regressor excelled in modeling both the "all" and "rainy season" datasets, while the convolutional neural network (CNN) outperformed in the "dry season" dataset. This study emphasizes the potential of XGBoost regressor and CNN for accurate groundwater quality prediction and recommends their utilization, while acknowledging the limitations of the Decision Tree Regressor.

摘要

土耳其东安纳托利亚地区比特利斯省及其周边地区的地下水资源是饮用水的关键,但由于该地区火山岩结构,氟污染对其构成了重大威胁。为了解决这个问题,在 2019 年 6 月干旱期和 2019 年 9 月雨季,在 30 个点对氟化物水平进行了精确测量。尽管目前的测量技术非常准确,但由于耗时较长,在经济上不可行。因此,本研究旨在评估地下水可能的地球成因污染的分布,并通过分析预测变量和目标污染物之间的关系来开发一个稳健的预测模型。在这项研究中,使用了各种机器学习技术和回归模型,包括线性回归、随机森林、决策树、K-近邻和 XGBoost,以及深度学习模型,如人工神经网络(ANN)、深度神经网络(DNN)、卷积神经网络(CNN)和长短期记忆网络(LSTM)。利用铝(Al)、硼(B)、镉(Cd)、钴(Co)、铬(Cr)、铜(Cu)、铁(Fe)、锰(Mn)、镍(Ni)、磷(P)、铅(Pb)和锌(Zn)等元素作为特征来预测氟化物水平。使用 SelectKbest 特征选择方法来提高预测模型的准确性。该方法可以识别数据集在不同 k 值下的重要特征,并提高模型效率。通过选择最重要的变量,模型能够产生更准确的预测。研究结果表明,XGBoost 回归器和 CNN 在预测地下水质量方面表现出色,XGBoost 始终优于其他模型,在不同 k 值下,其均方误差(MSE)、平均绝对误差(MAE)和均方根误差(RMSE)等评估指标的数值最低。例如,在考虑所有特征的情况下,XGBoost 的 MSE 为 0.07,MAE 为 0.22,RMSE 为 0.27,MAPE 为 9.25%,NSE 为 0.75。相比之下,决策树回归器的性能始终较差,其最大 MSE 达到 0.11(k=5),最大 RMSE 达到 0.33(k=5)。此外,特征选择分析表明,硼(B)和镉(Cd)在所有数据集上的重要性都一致,这突出了它们在地下水污染中的关键作用。值得注意的是,在机器学习框架评估中,XGBoost 回归器在“所有”和“雨季”数据集的建模方面表现出色,而卷积神经网络(CNN)在“旱季”数据集的建模方面表现出色。本研究强调了 XGBoost 回归器和 CNN 对准确预测地下水质量的潜力,并建议使用它们,同时承认决策树回归器的局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6627/11296968/251a1a84c21c/11356_2024_34194_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验