JIIT, Noida, India E-mail:
JIIT, Noida, India.
J Water Health. 2024 Aug;22(8):1387-1408. doi: 10.2166/wh.2024.063. Epub 2024 Jul 11.
India has been dealing with fluoride contamination of groundwater for the past few decades. Long-term exposure of fluoride can cause skeletal and dental fluorosis. Therefore, an in-depth exploration of fluoride concentrations in different parts of India is desirable. This work employs machine learning algorithms to analyze the fluoride concentrations in five major affected Indian states (Andhra Pradesh, Rajasthan, Tamil Nadu, Telangana and West Bengal). A correlation matrix was used to identify appropriate predictor variables for fluoride prediction. The various algorithms used for predictions included K-nearest neighbor (KNN), logistic regression (LR), random forest (RF), support vector classifier (SVC), Gaussian NB, MLP classifier, decision tree classifier, gradient boosting classifier, voting classifier soft and voting classifier hard. The performance of these models is assessed over accuracy, precision, recall and error rate and receiver operating curve. As the dataset was skewed, the performance of models was evaluated before and after resampling. Analysis of results indicates that the RF model is the best model for predicting fluoride contamination in groundwater in Indian states.
过去几十年,印度一直在应对地下水氟污染问题。长期接触氟会导致氟骨症和氟斑牙。因此,深入研究印度不同地区的氟浓度是很有必要的。本研究采用机器学习算法分析了印度五个主要受影响的邦(安得拉邦、拉贾斯坦邦、泰米尔纳德邦、特伦甘纳邦和西孟加拉邦)的氟浓度。使用相关矩阵来确定氟预测的合适预测变量。用于预测的各种算法包括 K-最近邻 (KNN)、逻辑回归 (LR)、随机森林 (RF)、支持向量分类器 (SVC)、高斯 NB、MLP 分类器、决策树分类器、梯度提升分类器、投票分类器软和投票分类器硬。通过准确率、精度、召回率和错误率以及接收者操作曲线来评估这些模型的性能。由于数据集存在偏倚,因此在进行重采样之前和之后评估了模型的性能。结果分析表明,RF 模型是预测印度各邦地下水中氟污染的最佳模型。