Department of Ecoscience, Aarhus University, Aarhus, Denmark.
Arctic Research Centre, Aarhus University, Aarhus, Denmark.
PeerJ. 2022 Aug 23;10:e13837. doi: 10.7717/peerj.13837. eCollection 2022.
Image-based methods for species identification offer cost-efficient solutions for biomonitoring. This is particularly relevant for invertebrate studies, where bulk samples often represent insurmountable workloads for sorting, identifying, and counting individual specimens. On the other hand, image-based classification using deep learning tools have strict requirements for the amount of training data, which is often a limiting factor. Here, we examine how classification accuracy increases with the amount of training data using the BIODISCOVER imaging system constructed for image-based classification and biomass estimation of invertebrate specimens. We use a balanced dataset of 60 specimens of each of 16 taxa of freshwater macroinvertebrates to systematically quantify how classification performance of a convolutional neural network (CNN) increases for individual taxa and the overall community as the number of specimens used for training is increased. We show a striking 99.2% classification accuracy when the CNN (EfficientNet-B6) is trained on 50 specimens of each taxon, and also how the lower classification accuracy of models trained on less data is particularly evident for morphologically similar species placed within the same taxonomic order. Even with as little as 15 specimens used for training, classification accuracy reached 97%. Our results add to a recent body of literature showing the huge potential of image-based methods and deep learning for specimen-based research, and furthermore offers a perspective to future automatized approaches for deriving ecological data from bulk arthropod samples.
基于图像的物种鉴定方法为生物监测提供了具有成本效益的解决方案。这对于无脊椎动物研究尤为重要,因为大量样本通常代表着分类、鉴定和计算单个标本的不可逾越的工作量。另一方面,基于深度学习工具的基于图像的分类对训练数据的数量有严格的要求,而这往往是一个限制因素。在这里,我们使用为基于图像的分类和无脊椎动物标本生物量估计而构建的 BIODISCOVER 成像系统,研究了分类准确性如何随着训练数据量的增加而提高。我们使用了一个平衡的数据集,其中包含 60 个淡水大型无脊椎动物 16 个分类群的每个分类群的标本,系统地量化了随着用于训练的标本数量的增加,卷积神经网络 (CNN) 对单个分类群和整个群落的分类性能如何提高。当将 CNN(EfficientNet-B6)训练在每个分类群的 50 个标本上时,我们得到了惊人的 99.2%的分类准确性,并且还表明,对于形态相似的物种,在同一分类阶元内,模型在数据较少的情况下的较低分类准确性尤其明显。即使只用 15 个标本进行训练,分类准确性也达到了 97%。我们的结果增加了最近的一系列文献,这些文献表明基于图像的方法和深度学习在基于标本的研究中具有巨大的潜力,并且为从大量节肢动物样本中得出生态数据的未来自动化方法提供了一个视角。