Water Quality Research and Development, Southern Nevada Water Authority, 1299 Burkholder Blvd., Henderson, United States; Carollo Engineers, Inc., 8911 N Capital of Texas Hwy, Austin, TX 78759, United States.
Water Quality Research and Development, Southern Nevada Water Authority, 1299 Burkholder Blvd., Henderson, United States.
Water Res. 2021 Oct 1;204:117556. doi: 10.1016/j.watres.2021.117556. Epub 2021 Aug 13.
Water quality events such as increases in stormwater or wastewater effluent in drinking water sources pose hazards to drinking water consumers. Stormwater and wastewater effluent enter Lake Mead-an important drinking water source in the southwest USA-via the Las Vegas Wash. Previous studies have applied machine learning and online instruments to detect contamination in water distribution systems. However, alert systems at drinking water intakes would provide more time for corrective action. An array of online instruments measuring pH, conductivity, redox potential, turbidity, temperature, tryptophan-like fluorescence, UV absorbance (UVA), TOC, and chlorophyll-a was fed raw water directly from Lake Mead. Wastewater effluent, dry weather Las Vegas Wash, and storm-impacted Las Vegas Wash samples were blended into the instrument inlets at known ratios to simulate three types of adverse water quality events. Data preprocessing was conducted to correct for diurnal patterns or instrument drift. Supervised machine learning was conducted using previously published models in R. Ninety-nine models were screened on the raw data. Eight high-performing models were evaluated in-depth and optimized. Weighted k-Nearest Neighbors, Single C5.0 Ruleset, Mixture Discriminant Analysis, and an ensemble of these three models had accuracy over 97% when assigning test set data among three classes (Normal, Event, or Maintenance). The ensemble detected all event types at the earliest timepoint and had one false positive that was not a lag error (i.e., consecutively following a true positive). Omitting Maintenance, the Adaboost model had over 99% test set accuracy and zero false positives that were not lag errors. Data preprocessing was beneficial, but the optimal methods were model-specific. All nine water quality variables were useful for most models, but UVA and turbidity were most important.
水质事件,如暴雨或废水在饮用水源中的增加,对饮用水消费者构成危害。暴雨和废水通过拉斯维加斯溪进入美国西南部的重要饮用水源——米德湖。先前的研究已经应用机器学习和在线仪器来检测水分配系统中的污染。然而,在饮用水入口处设置警报系统将为采取纠正措施提供更多时间。一系列在线仪器直接从米德湖测量 pH 值、电导率、氧化还原电位、浊度、温度、色氨酸样荧光、紫外线吸收 (UVA)、总有机碳 (TOC) 和叶绿素-a。将废水、干燥天气下的拉斯维加斯溪和受风暴影响的拉斯维加斯溪样本以已知比例混合到仪器入口,以模拟三种类型的不利水质事件。进行数据预处理以校正昼夜模式或仪器漂移。使用 R 中先前发表的模型进行有监督的机器学习。在原始数据上筛选了 99 个模型。对 8 个高性能模型进行了深入评估和优化。加权 k-最近邻、单 C5.0 规则集、混合判别分析以及这三个模型的集成在将测试集数据分配到三个类别(正常、事件或维护)时准确率超过 97%。该集成在最早的时间点检测到所有事件类型,并且只有一个假阳性不是滞后错误(即,紧随真正的阳性)。不包括维护,Adaboost 模型的测试集准确率超过 99%,且没有非滞后错误的假阳性。数据预处理是有益的,但最佳方法是特定于模型的。所有九个水质变量对大多数模型都有用,但 UVA 和浊度最重要。