Department of Civil & Environmental Engineering, Stanford University, 473 Via Ortega, Stanford, Palo Alto 94305, California, United States.
Environ Sci Technol. 2021 Feb 2;55(3):1908-1918. doi: 10.1021/acs.est.0c06742. Epub 2021 Jan 20.
To reduce the incidence of recreational waterborne illness, fecal indicator bacteria (FIB) are measured to assess water quality and inform beach management. Recently, predictive FIB models have been used to aid managers in making beach posting and closure decisions. However, those predictive models must be trained using rich historical data sets consisting of FIB and environmental data that span years, and many beaches lack such data sets. Here, we investigate whether water quality data collected during discrete short duration, high-frequency beach sampling events (e.g., samples collected at sub-hourly intervals for 24-48 h) are sufficient to train predictive models that can be used for beach management. We use data collected during six high-frequency sampling events at three California marine beaches and train a total of 126 models using common data-driven techniques. Tide, solar irradiation, water temperature, significant wave height, and offshore wind speed were found to be the most important environmental variables in the models. We validate the predictive performance of models using withheld data. Random forests are consistently the top performing model type. Overall, we find that data-driven models trained using high-frequency FIB and environmental data perform well at predicting water quality and can be used to inform public health decisions at beaches.
为了降低休闲性水上疾病的发病率,通常会使用粪便指示菌(FIB)来评估水质,并为海滩管理提供信息。最近,预测性 FIB 模型被用于帮助管理人员做出海滩开放和关闭的决策。然而,这些预测模型必须使用包含多年 FIB 和环境数据的丰富历史数据集进行训练,而许多海滩缺乏这样的数据集。在这里,我们研究了在离散短时间、高频海滩采样事件(例如,每小时收集一次,持续 24-48 小时)中收集的水质数据是否足以训练可用于海滩管理的预测模型。我们使用加利福尼亚州三个海洋海滩的六次高频采样事件中收集的数据,使用常见的数据驱动技术总共训练了 126 个模型。潮汐、太阳辐射、水温、显著波高和离岸风速被发现是模型中最重要的环境变量。我们使用保留数据验证模型的预测性能。随机森林始终是表现最好的模型类型。总的来说,我们发现使用高频 FIB 和环境数据训练的数据驱动模型在预测水质方面表现良好,可以用于为海滩的公共卫生决策提供信息。