Suppr超能文献

基于机器学习的饮用水水质监管和监测优先筛选农药:可检测性影响因素分析。

Screening priority pesticides for drinking water quality regulation and monitoring by machine learning: Analysis of factors affecting detectability.

机构信息

Graduate School of Engineering, Hokkaido University, N13W8, Sapporo, 060-8628, Japan.

Faculty of Engineering, Hokkaido University, N13W8, Sapporo, 060-8628, Japan.

出版信息

J Environ Manage. 2023 Jan 15;326(Pt A):116738. doi: 10.1016/j.jenvman.2022.116738. Epub 2022 Nov 11.

Abstract

Proper selection of new contaminants to be regulated or monitored prior to implementation is an important issue for regulators and water supply utilities. Herein, we constructed and evaluated machine learning models for predicting the detectability (detection/non-detection) of pesticides in surface water as drinking water sources. Classification and regression models were constructed for Random Forest, XGBoost, and LightGBM, respectively; of these, the LightGBM classification model had the highest prediction accuracy. Furthermore, its prediction performance was superior in all aspects of Recall, Precision, and F-measure compared to the detectability index method, which is based on runoff models from previous studies. Regardless of the type of machine learning model, the number of annual measurements, sales quantity of pesticide for rice-paddy field, and water quality guideline values were the most important model features (explanatory variables). Analysis of the impact of the features suggested the presence of a threshold (or range), above which the detectability increased. In addition, if a feature (e.g., quantity of pesticide sales) acted to increase the likelihood of detection beyond a threshold value, other features also synergistically affected detectability. Proportion of false positives and negatives varied depending on the features used. The superiority of the machine learning models is their ability to represent nonlinear and complex relationships between features and pesticide detectability that cannot be represented by existing risk scoring methods.

摘要

在实施之前,正确选择要监管或监测的新污染物,这对监管机构和供水单位来说是一个重要的问题。在此,我们构建并评估了机器学习模型,用于预测地表水作为饮用水源时农药的可检测性(检出/未检出)。分别为随机森林、XGBoost 和 LightGBM 构建了分类和回归模型;其中,LightGBM 分类模型具有最高的预测准确性。此外,与基于先前研究的径流量模型的检出指数方法相比,其在召回率、准确率和 F 度量的所有方面的预测性能都更优。无论使用哪种机器学习模型,年测量次数、稻田农药销售量和水质指导值都是最重要的模型特征(解释变量)。特征影响分析表明,存在一个阈值(或范围),超过该阈值可提高检出率。此外,如果某个特征(例如,农药销售量)使检出的可能性超过阈值,则其他特征也会协同影响检出率。假阳性和假阴性的比例取决于所使用的特征。机器学习模型的优势在于其能够表示特征与农药检出率之间的非线性和复杂关系,而这是现有风险评分方法无法表示的。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验