College of Information Engineering, Dalian Ocean University, Dalian, China.
Dalian Key Laboratory of Smart Fisheries, Dalian Ocean University, Dalian, China.
J Fish Biol. 2024 Sep;105(3):721-734. doi: 10.1111/jfb.15793. Epub 2024 Jun 9.
With the continuous development of green and high-quality aquaculture technology, the process of industrialized aquaculture has been promoted. Automation, intelligence, and precision have become the future development trend of the aquaculture industry. Fish individual recognition can further distinguish fish individuals based on the determination of fish categories, providing basic support for fish disease analysis, bait feeding, and precision aquaculture. However, the high similarity of fish individuals and the complexity of the underwater environment presents great challenges to fish individual recognition. To address these problems, we propose a novel fish individual recognition method for precision farming that rethinks the knowledge distillation strategy and the chunking method in the vision transformer. The method uses the traditional convolutional neural network model as the teacher model, introducing the teacher token to guide the student model to learn the fish texture features. We propose stride patch embedding to expand the range of the receptive field, thus enhancing the local continuity of the image, and self-attention-pruning to discard unimportant tokens and reduce the model computation. The experimental results on the DlouFish dataset show that the proposed method in this paper improves accuracy by 3.25% compared to ECA Resnet152, with an accuracy of 93.19%, and also outperforms other vision transformer models.
随着绿色、高质量水产养殖技术的不断发展,水产养殖的工业化进程得到了推进。自动化、智能化和精准化成为水产养殖行业的未来发展趋势。鱼类个体识别可以进一步根据鱼类种类的确定来区分鱼类个体,为鱼类疾病分析、投饵和精准养殖提供基本支持。然而,鱼类个体的高度相似性和水下环境的复杂性给鱼类个体识别带来了巨大的挑战。为了解决这些问题,我们提出了一种新的用于精准养殖的鱼类个体识别方法,重新思考了视觉转换器中的知识蒸馏策略和分块方法。该方法使用传统的卷积神经网络模型作为教师模型,引入教师令牌来指导学生模型学习鱼类纹理特征。我们提出步长补丁嵌入来扩大感受野的范围,从而增强图像的局部连续性,并使用自注意力修剪来丢弃不重要的令牌并减少模型计算。在 DlouFish 数据集上的实验结果表明,与 ECA Resnet152 相比,本文提出的方法将准确率提高了 3.25%,达到 93.19%,并且优于其他视觉转换器模型。