用于蓝藻荧光信号异常检测的机器学习

Machine learning for anomaly detection in cyanobacterial fluorescence signals.

作者信息

Almuhtaram Husein, Zamyadi Arash, Hofmann Ron

机构信息

Department of Civil and Mineral Engineering, University of Toronto, Toronto ON M5S 1A4 Canada.

Water RA Melbourne based position hosted by Melbourne Water, 990 La Trobe St, Docklands VIC 3008, Australia; BGA Innovation Hub and Water Research Centre, School of Civil and Environment Engineering, University of New South Wales (UNSW), Sydney, NSW 2052, Australia.

出版信息

Water Res. 2021 Jun 1;197:117073. doi: 10.1016/j.watres.2021.117073. Epub 2021 Mar 19.

DOI:10.1016/j.watres.2021.117073

PMID:33784609

Abstract

Many drinking water utilities drawing from waters susceptible to harmful algal blooms (HABs) are implementing monitoring tools that can alert them to the onset of blooms. Some have invested in fluorescence-based online monitoring probes to measure phycocyanin, a pigment found in cyanobacteria, but it is not clear how to best use the data generated. Previous studies have focused on correlating phycocyanin fluorescence and cyanobacteria cell counts. However, not all utilities collect cell count data, making this method impossible to apply in some cases. Instead, this paper proposes a novel approach to determine when a utility needs to respond to a HAB based on machine learning by identifying anomalies in phycocyanin fluorescence data without the need for corresponding cell counts or biovolume. Four widespread and open source algorithms are evaluated on data collected at four buoys in Lake Erie from 2014 to 2019: local outlier factor (LOF), One-Class Support Vector Machine (SVM), elliptic envelope, and Isolation Forest (iForest). When trained on standardized historical data from 2014 to 2018 and tested on labelled 2019 data collected at each buoy, the One-Class SVM and elliptic envelope models both achieve a maximum average F1 score of 0.86 among the four datasets. Therefore, One-Class SVM and elliptic envelope are promising algorithms for detecting potential HABs using fluorescence data only.

摘要

许多从易受有害藻华（HABs）影响的水源取水的饮用水公用事业公司正在采用监测工具，以便在藻华开始时发出警报。一些公司投资了基于荧光的在线监测探头来测量藻蓝蛋白，这是一种在蓝细菌中发现的色素，但目前尚不清楚如何最好地利用所生成的数据。以往的研究主要集中在将藻蓝蛋白荧光与蓝细菌细胞计数相关联。然而，并非所有公用事业公司都收集细胞计数数据，这使得这种方法在某些情况下无法应用。相反，本文提出了一种新颖的方法，通过识别藻蓝蛋白荧光数据中的异常情况，基于机器学习来确定公用事业公司何时需要对有害藻华做出反应，而无需相应的细胞计数或生物量数据。对2014年至2019年在伊利湖四个浮标处收集的数据评估了四种广泛使用的开源算法：局部离群因子（LOF）、单类支持向量机（SVM）、椭圆包络和孤立森林（iForest）。当使用2014年至2018年的标准化历史数据进行训练，并在每个浮标处收集的2019年标记数据上进行测试时，单类支持向量机和椭圆包络模型在四个数据集中均达到了0.86的最大平均F1分数。因此，单类支持向量机和椭圆包络是仅使用荧光数据检测潜在有害藻华的有前景的算法。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于蓝藻荧光信号异常检测的机器学习

Machine learning for anomaly detection in cyanobacterial fluorescence signals.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

用于蓝藻荧光信号异常检测的机器学习

Machine learning for anomaly detection in cyanobacterial fluorescence signals.

作者信息

机构信息

出版信息

相似文献

引用本文的文献