Suppr超能文献

可解释性与准确性:使用不同算法、性能指标和特征构建的机器学习模型预测农业用水水平的比较

Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict Levels in Agricultural Water.

作者信息

Weller Daniel L, Love Tanzy M T, Wiedmann Martin

机构信息

Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, United States.

Department of Food Science, Cornell University, Ithaca, NY, United States.

出版信息

Front Artif Intell. 2021 May 14;4:628441. doi: 10.3389/frai.2021.628441. eCollection 2021.

Abstract

Since is considered a fecal indicator in surface water, government water quality standards and industry guidance often rely on monitoring to identify when there is an increased risk of pathogen contamination of water used for produce production (e.g., for irrigation). However, studies have indicated that testing can present an economic burden to growers and that time lags between sampling and obtaining results may reduce the utility of these data. Models that predict levels in agricultural water may provide a mechanism for overcoming these obstacles. Thus, this proof-of-concept study uses previously published datasets to train, test, and compare predictive models using multiple algorithms and performance measures. Since the collection of different feature data carries specific costs for growers, predictive performance was compared for models built using different feature types [geospatial, water quality, stream traits, and/or weather features]. Model performance was assessed against baseline regression models. Model performance varied considerably with root-mean-squared errors and Kendall's Tau ranging between 0.37 and 1.03, and 0.07 and 0.55, respectively. Overall, models that included turbidity, rain, and temperature outperformed all other models regardless of the algorithm used. Turbidity and weather factors were also found to drive model accuracy even when other feature types were included in the model. These findings confirm previous conclusions that machine learning models may be useful for predicting when, where, and at what level (and associated hazards) are likely to be present in preharvest agricultural water sources. This study also identifies specific algorithm-predictor combinations that should be the foci of future efforts to develop deployable models (i.e., models that can be used to guide on-farm decision-making and risk mitigation). When deploying predictive models in the field, it is important to note that past research indicates an inconsistent relationship between levels and foodborne pathogen presence. Thus, models that predict levels in agricultural water may be useful for assessing fecal contamination status and ensuring compliance with regulations but should not be used to assess the risk that specific pathogens of concern (e.g., , ) are present.

摘要

由于粪大肠菌群被视为地表水中的粪便指示菌,政府水质标准和行业指南通常依靠粪大肠菌群监测来确定用于农产品生产(如灌溉)的水源受病原体污染风险增加的时间。然而,研究表明,粪大肠菌群检测会给种植者带来经济负担,而且采样与获取结果之间的时间滞后可能会降低这些数据的效用。预测农业用水中粪大肠菌群水平的模型可能提供一种克服这些障碍的机制。因此,这项概念验证研究使用先前发表的数据集,采用多种算法和性能指标来训练、测试和比较粪大肠菌群预测模型。由于收集不同的特征数据会给种植者带来特定成本,因此对使用不同特征类型(地理空间、水质、溪流特征和/或天气特征)构建的模型的预测性能进行了比较。模型性能是相对于基线回归模型进行评估的。模型性能差异很大,均方根误差和肯德尔等级相关系数分别在0.37至1.03和0.07至0.55之间。总体而言,无论使用何种算法,包含浊度、降雨和温度的模型都优于所有其他模型。即使模型中包含其他特征类型,也发现浊度和天气因素会推动模型准确性。这些发现证实了先前的结论,即机器学习模型可能有助于预测收获前农业水源中粪大肠菌群(及相关危害)可能出现的时间、地点和水平。本研究还确定了特定的算法-预测因子组合,这些组合应成为未来开发可部署模型(即可用于指导农场决策和风险缓解的模型)工作的重点。在田间部署粪大肠菌群预测模型时,需要注意的是,过去的研究表明粪大肠菌群水平与食源性病原体存在之间的关系并不一致。因此,预测农业用水中粪大肠菌群水平的模型可能有助于评估粪便污染状况并确保符合法规,但不应被用于评估特定关注病原体(如沙门氏菌、大肠杆菌)存在的风险。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9617/8160515/101d14d3051d/frai-04-628441-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验