Department of Computer Science and Engineering, Indian Institute of Information Technology Ranchi, Ranchi, Jharkhand 834010, India E-mail:
J Water Health. 2022 May;20(5):829-848. doi: 10.2166/wh.2022.015.
This paper presents a machine learning approach for classification of arsenic (As) levels as safe and unsafe in groundwater samples collected from the Indo-Gangetic region. As water is essential for sustaining life, heavy metals like arsenic pose a public health concern. In this study, various tree-based machine learning models namely Random Forest, Optimized Forest, CS Forest, SPAARC, and REP Tree algorithms have been applied to classify water samples. As per the guidelines of the World Health Organization (WHO), the arsenic concentration in water should not exceed 10 μg/L. The groundwater quality parameter was ranked using a classifier attribute evaluator for training and testing the models. Parameters obtained from the confusion matrix, such as accuracy, precision, recall, and FPR, were used to analyze the performance of models. Among all models, Optimized Forest outperforms other classifier as it has a high accuracy of 80.64%, a precision of 80.70%, recall of 97.87%, and a low FPR of 73.33%. The Optimized Forest model can be used to test new water samples for classification of arsenic in groundwater samples.
本文提出了一种基于机器学习的方法,用于对从印度恒河流域采集的地下水样本中的砷(As)含量进行安全和不安全的分类。由于水是维持生命所必需的,因此像砷这样的重金属对公共健康构成了威胁。在这项研究中,应用了各种基于树的机器学习模型,即随机森林、优化森林、CS 森林、SPAARC 和 REP Tree 算法,以对水样进行分类。根据世界卫生组织(WHO)的指南,水中的砷浓度不应超过 10μg/L。使用分类器属性评估器对地下水质量参数进行排名,以训练和测试模型。从混淆矩阵中获得的参数,如准确性、精度、召回率和 FPR,用于分析模型的性能。在所有模型中,优化森林表现优于其他分类器,因为它具有 80.64%的高准确性、80.70%的高精度、97.87%的高召回率和 73.33%的低 FPR。优化森林模型可用于测试新的水样,以对地下水样本中的砷进行分类。