School of Mathematics and Statistics, University of Canterbury, Private Bag 4800, Christchurch 8140, New Zealand; Institute of Environmental Science and Research, ESR, PO Box 29181, Christchurch 8540, New Zealand.
Institute of Environmental Science and Research, ESR, PO Box 29181, Christchurch 8540, New Zealand.
J Environ Manage. 2018 Jan 15;206:910-919. doi: 10.1016/j.jenvman.2017.11.049. Epub 2017 Dec 5.
Exposure to contaminated water while swimming or boating or participating in other recreational activities can cause gastrointestinal and respiratory disease. It is not uncommon for water bodies to experience rapid fluctuations in water quality, and it is therefore vital to be able to predict them accurately and in time so as to minimise population's exposure to pathogenic organisms. E. coli is commonly used as an indicator to measure water quality in freshwater, and higher counts of E. coli are associated with increased risk to illness. In this case study, we compare the performance of a wide range of statistical models in prediction of water quality via E. coli levels for the weekly data collected over the summer months from 2006 to 2014 at the recreational site on the Oreti river in Wallacetown, New Zealand. The models include naive model, multiple linear regression, dynamic regression, regression tree, Markov chain, classification tree, random forests, multinomial logistic regression, discriminant analysis and Bayesian network. The results show that Bayesian network was superior to all the other models. Overall, it had a leave-one-out and k-fold cross validation error rate of 21%, while predicting the majority of instances of E. coli levels classified as unsafe by the Microbiological Water Quality Guidelines for Marine and Freshwater Recreational Areas 2003, New Zealand. Because Bayesian networks are also flexible in handling missing data and outliers and allow for continuous updating in real time, we have found them to be a promising tool, and in the future, plan to extend the analysis beyond the current case study site.
游泳、划船或参与其他娱乐活动时接触受污染的水会导致胃肠道和呼吸道疾病。水体的水质经常会发生快速波动,因此,能够准确及时地预测水质变化对于将人群接触病原体的风险降到最低至关重要。大肠杆菌通常被用作衡量淡水水质的指标,较高的大肠杆菌计数与患病风险增加有关。在本案例研究中,我们比较了广泛的统计模型在通过 2006 年至 2014 年夏季在新西兰瓦勒斯敦奥雷蒂河娱乐区每周收集的数据预测大肠杆菌水平的水质方面的性能。这些模型包括朴素模型、多元线性回归、动态回归、回归树、马尔可夫链、分类树、随机森林、多项逻辑回归、判别分析和贝叶斯网络。结果表明,贝叶斯网络优于所有其他模型。总的来说,它的留一法和 k 折交叉验证错误率为 21%,而预测了大多数被新西兰 2003 年海洋和淡水娱乐区微生物水质指南归类为不安全的大肠杆菌水平实例。由于贝叶斯网络在处理缺失数据和异常值方面也很灵活,并允许实时连续更新,因此我们发现它们是一种很有前途的工具,并且计划在未来将分析扩展到当前案例研究之外的地点。