Department of Computer Science, School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran.
The Graduate Center, The City University of New York, New York, NY, United States.
J Environ Manage. 2024 Jun;362:121274. doi: 10.1016/j.jenvman.2024.121274. Epub 2024 Jun 4.
Cyanobacteria are the dominating microorganisms in aquatic environments, posing significant risks to public health due to toxin production in drinking water reservoirs. Traditional water quality assessments for abundance of the toxigenic genera in water samples are both time-consuming and error-prone, highlighting the urgent need for a fast and accurate automated approach. This study addresses this gap by introducing a novel public dataset, TCB-DS (Toxigenic Cyanobacteria Dataset), comprising 2593 microscopic images of 10 toxigenic cyanobacterial genera and subsequently, an automated system to identify these genera which can be divided into two parts. Initially, a feature extractor Convolutional Neural Network (CNN) model was employed, with MobileNet emerging as the optimal choice after comparing it with various other popular architectures such as MobileNetV2, VGG, etc. Secondly, to perform classification algorithms on the extracted features of the first section, multiple approaches were tested and the experimental results indicate that a Fully Connected Neural Network (FCNN) had the optimal performance with weighted accuracy and f1-score of 94.79% and 94.91%, respectively. The highest macro accuracy and f1-score were 90.17% and 87.64% which were acquired using MobileNetV2 as the feature extractor and FCNN as the classifier. These results demonstrate that the proposed approach can be employed as an automated screening tool for identifying toxigenic Cyanobacteria with practical implications for water quality control replacing the traditional estimation given by the lab operator following microscopic observations. The dataset and code of this paper are publicly available at https://github.com/iman2693/CTCB.
蓝藻是水生环境中的主要微生物,由于在饮用水库中产生毒素,对公众健康构成重大风险。传统的水质评估方法是对水样中产毒属的丰度进行评估,既费时又容易出错,因此迫切需要一种快速、准确的自动化方法。本研究通过引入一个新的公共数据集 TCB-DS(产毒蓝藻数据集)来解决这一差距,该数据集包含 10 个产毒蓝藻属的 2593 张微观图像,随后引入了一种自动识别这些属的系统,该系统可分为两部分。首先,使用卷积神经网络(CNN)模型作为特征提取器,在与各种其他流行架构(如 MobileNetV2、VGG 等)进行比较后,选择 MobileNet 作为最优选择。其次,为了对第一节中提取的特征进行分类算法,测试了多种方法,实验结果表明,全连接神经网络(FCNN)的性能最佳,加权准确率和 f1 得分为 94.79%和 94.91%。使用 MobileNetV2 作为特征提取器和 FCNN 作为分类器,获得了最高的宏准确率和 f1 得分为 90.17%和 87.64%。这些结果表明,该方法可以作为一种自动筛选工具,用于识别产毒蓝藻,对水质控制具有实际意义,可以替代传统的由实验室操作人员进行微观观察后的估计方法。本文的数据集和代码可在 https://github.com/iman2693/CTCB 上公开获取。