Chattopadhyay Arghya, Singh Anand Prakash, Kumar Siddharth, Pati Jayadeep, Rakshit Amitava
Department of Soil Science & Agricultural Chemistry, Institute of Agricultural Sciences, Banaras Hindu University, Varanasi, Uttar Pradesh 221005, India E-mail:
Department of Soil Science & Agricultural Chemistry, Institute of Agricultural Sciences, Banaras Hindu University, Varanasi, Uttar Pradesh 221005, India.
Water Sci Technol. 2023 Aug;88(3):595-614. doi: 10.2166/wst.2023.231.
Arsenic contamination in groundwater due to natural or anthropogenic sources is responsible for carcinogenic and non-carcinogenic risks to humans and the ecosystem. The physicochemical properties of groundwater in the study area were determined in the laboratory using the samples collected across the Varanasi region of Uttar Pradesh, India. This paper analyses the physicochemical properties of water using machine learning, descriptive statistics, geostatistical and spatial analysis. Pearson correlation was used for feature selection and highly correlated features were selected for model creation. Hydrochemical facies of the study area were analyzed and the hyperparameters of machine learning models, i.e., multilayer perceptron, random forest (RF), naïve Bayes, and decision tree were optimized before training and testing the groundwater samples as high (1) or low (0) arsenic contamination levels based on the WHO 10 μg/L guideline value. The overall performance of the models was compared based on accuracy, sensitivity, and specificity value. Among all models, the RF algorithm outclasses other classifiers, as it has a high accuracy of 92.30%, a sensitivity of 100%, and a specificity of 75%. The accuracy result was compared to prior research, and the machine learning model may be used to continually monitor the amount of arsenic pollution in groundwater.
由于自然或人为来源导致的地下水中砷污染,会对人类和生态系统造成致癌和非致癌风险。利用在印度北方邦瓦拉纳西地区采集的样本,在实验室中测定了研究区域地下水的物理化学性质。本文运用机器学习、描述性统计、地质统计和空间分析方法,分析了水的物理化学性质。使用皮尔逊相关性进行特征选择,并选择高度相关的特征来创建模型。在根据世界卫生组织10μg/L的指导值将地下水样本作为高(1)或低(0)砷污染水平进行训练和测试之前,分析了研究区域的水化学相,并对机器学习模型(即多层感知器、随机森林(RF)、朴素贝叶斯和决策树)的超参数进行了优化。基于准确率、灵敏度和特异性值,比较了模型的整体性能。在所有模型中,RF算法优于其他分类器,其准确率高达92.30%,灵敏度为100%,特异性为75%。将准确率结果与先前的研究进行了比较,该机器学习模型可用于持续监测地下水中的砷污染量。