Çanakkale Onsekiz Mart University Faculty of Health Sciences Department of Public Health Nursing, Çanakkale, Turkey.
Ege University Health Sciences Institute, İzmir, Turkey.
Public Health Nurs. 2024 Jan-Feb;41(1):175-191. doi: 10.1111/phn.13264. Epub 2023 Nov 23.
The aim of this study is to use machine learning models to predict drinking water quality from a public health nursing approach.
Machine learning study.
"Water Quality Dataset" was used in the study. The dataset contains physical and chemical measurements of water quality for 2400 different water bodies. The process consists of four stages: Data processing with Synthetic Minority Oversampling Technique, hyperparameter tuning with 10-fold cross-validation, modeling and comparative analysis. 80% of the dataset is allocated as training data and 20% as test data. ML models logistic regression, K-nearest neighbor, support vector machine, random forest, XGBoost, AdaBoost Classifier, Decision Tree algorithms were used for water quality prediction. Accuracy, precision, recall, F1 score and AUC performance metrics of ML models were compared. To evaluate the performance of the models, 10-fold cross-validation was used and a comparative analysis was performed. The p-values of the models were also compared.
N this study, where drinking water quality was predicted with seven different ML algorithms, it can be said that XGBoost and Random Forest are the best classification models in all performance metrics. There is a significant difference in all ML algorithms according to the p-value. The H0 hypothesis is accepted for these algorithms. According to the H0 hypothesis, there is no difference between actual values and predicted values.
In conclusion, the use of ML models in the prediction of drinking water quality can help nurses greatly improve access to clean water, a human right, be more knowledgeable about water quality, and protect the health of individuals.
本研究旨在利用机器学习模型从公共卫生护理角度预测饮用水水质。
机器学习研究。
“水质数据集”用于研究。该数据集包含 2400 个不同水体的水质物理和化学测量值。该过程包括四个阶段:使用合成少数过采样技术进行数据处理、使用 10 折交叉验证进行超参数调整、建模和比较分析。数据集的 80%分配为训练数据,20%为测试数据。ML 模型逻辑回归、K-最近邻、支持向量机、随机森林、XGBoost、AdaBoost 分类器、决策树算法用于水质预测。比较了 ML 模型的准确性、精度、召回率、F1 分数和 AUC 性能指标。为了评估模型的性能,使用了 10 折交叉验证并进行了比较分析。还比较了模型的 p 值。
在这项研究中,使用七种不同的 ML 算法预测饮用水水质,可以说 XGBoost 和随机森林在所有性能指标中都是最好的分类模型。根据 p 值,所有 ML 算法之间存在显著差异。对于这些算法,H0 假设被接受。根据 H0 假设,实际值和预测值之间没有差异。
总之,机器学习模型在饮用水水质预测中的应用可以帮助护士更好地获取清洁水,这是一项人权,提高对水质的认识,并保护个人健康。