Istituto Italiano di Tecnologia, Genoa, Italy.
MaLGa-DIBRIS, Università degli studi di Genova, Genoa, Italy.
Sci Rep. 2023 Jun 27;13(1):10443. doi: 10.1038/s41598-023-37627-7.
Plankton microorganisms play a huge role in the aquatic food web. Recently, it has been proposed to use plankton as a biosensor, since they can react to even minimal perturbations of the aquatic environment with specific physiological changes, which may lead to alterations in morphology and behavior. Nowadays, the development of high-resolution in-situ automatic acquisition systems allows the research community to obtain a large amount of plankton image data. Fundamental examples are the ZooScan and Woods Hole Oceanographic Institution (WHOI) datasets, comprising up to millions of plankton images. However, obtaining unbiased annotations is expensive both in terms of time and resources, and in-situ acquired datasets generally suffer from severe imbalance, with only a few images available for several species. Transfer learning is a popular solution to these challenges, with ImageNet1K being the most-used source dataset for pre-training. On the other hand, datasets like the ZooScan and the WHOI may represent a valuable opportunity to compare out-of-domain and large-scale plankton in-domain source datasets, in terms of performance for the task at hand.In this paper, we design three transfer learning pipelines for plankton image classification, with the aim of comparing in-domain and out-of-domain transfer learning on three popular benchmark plankton datasets. The general framework consists in fine-tuning a pre-trained model on a plankton target dataset. In the first pipeline, the model is pre-trained from scratch on a large-scale plankton dataset, in the second, it is pre-trained on large-scale natural image datasets (ImageNet1K or ImageNet22K), while in the third, a two-stage fine-tuning is implemented (ImageNet [Formula: see text] large-scale plankton dataset [Formula: see text] target plankton dataset). Our results show that an out-of-domain ImageNet22K pre-training outperforms the plankton in-domain ones, with an average boost in test accuracy of around 6%. In the next part of this work, we adopt three ImageNet22k pre-trained Vision Transformers and one ConvNeXt, obtaining results on par (or slightly superior) with the state-of-the-art, corresponding to the usage of CNN models ensembles, with a single model. Finally, we design and test an ensemble of our Vision Transformers and the ConvNeXt, outperforming the state-of-the-art existing works on plankton image classification on the three target datasets. To support scientific community contribution and further research, our implemented code is open-source and available at https://github.com/Malga-Vision/plankton_transfer .
浮游微生物在水生食物网中起着巨大的作用。最近,有人提议将浮游生物用作生物传感器,因为它们可以对水生环境的微小干扰做出反应,并产生特定的生理变化,从而导致形态和行为的改变。如今,高分辨率原位自动采集系统的发展使得研究人员能够获得大量的浮游生物图像数据。一个基本的例子是 ZooScan 和 Woods Hole 海洋学研究所 (WHOI) 的数据集,其中包含多达数百万张浮游生物图像。然而,获得无偏注释既费时又费资源,而原位采集的数据集通常存在严重的不平衡,只有少数几种图像适用于几种物种。迁移学习是解决这些挑战的一种流行方法,其中 ImageNet1K 是最常用的预训练源数据集。另一方面,像 ZooScan 和 WHOI 这样的数据集可能代表了一个有价值的机会,可以比较手头任务的域外和大规模浮游生物的域外和大规模源数据集的性能。在本文中,我们设计了三个用于浮游生物图像分类的迁移学习管道,目的是在三个流行的浮游生物基准数据集上比较域内和域外迁移学习。总体框架包括在浮游生物目标数据集上微调预先训练好的模型。在第一个管道中,模型从头开始在大规模浮游生物数据集上进行预训练,在第二个管道中,模型在大规模自然图像数据集(ImageNet1K 或 ImageNet22K)上进行预训练,而在第三个管道中,执行两阶段微调(ImageNet [Formula: see text] 大规模浮游生物数据集 [Formula: see text] 目标浮游生物数据集)。我们的结果表明,域外的 ImageNet22K 预训练优于域内的预训练,测试准确率平均提高了约 6%。在这项工作的下一部分中,我们采用了三个 ImageNet22k 预训练的 Vision Transformers 和一个 ConvNeXt,并在三个目标数据集上获得了与最先进的结果相当(或略优)的结果,对应于使用 CNN 模型集成,而仅使用单个模型。最后,我们设计并测试了我们的 Vision Transformers 和 ConvNeXt 的集成,在三个目标数据集上的浮游生物图像分类方面优于现有的最先进工作。为了支持科学界的贡献和进一步的研究,我们实现的代码是开源的,并可在 https://github.com/Malga-Vision/plankton_transfer 上获得。