Jiang Xing, Zhuang Xiting, Chen Jisheng, Zhang Jian, Zhang Yiwen
School of Tropical Agriculture and Forestry (School of Agricultural and Rural, School of Rural Revitalization), Hainan University, Danzhou 571737, China.
Sensors (Basel). 2024 May 1;24(9):2905. doi: 10.3390/s24092905.
Underwater visual detection technology is crucial for marine exploration and monitoring. Given the growing demand for accurate underwater target recognition, this study introduces an innovative architecture, YOLOv8-MU, which significantly enhances the detection accuracy. This model incorporates the large kernel block (LarK block) from UniRepLKNet to optimize the backbone network, achieving a broader receptive field without increasing the model's depth. Additionally, the integration of C2fSTR, which combines the Swin transformer with the C2f module, and the SPPFCSPC_EMA module, which blends Cross-Stage Partial Fast Spatial Pyramid Pooling (SPPFCSPC) with attention mechanisms, notably improves the detection accuracy and robustness for various biological targets. A fusion block from DAMO-YOLO further enhances the multi-scale feature extraction capabilities in the model's neck. Moreover, the adoption of the MPDIoU loss function, designed around the vertex distance, effectively addresses the challenges of localization accuracy and boundary clarity in underwater organism detection. The experimental results on the URPC2019 dataset indicate that YOLOv8-MU achieves an mAP@0.5 of 78.4%, showing an improvement of 4.0% over the original YOLOv8 model. Additionally, on the URPC2020 dataset, it achieves 80.9%, and, on the Aquarium dataset, it reaches 75.5%, surpassing other models, including YOLOv5 and YOLOv8n, thus confirming the wide applicability and generalization capabilities of our proposed improved model architecture. Furthermore, an evaluation on the improved URPC2019 dataset demonstrates leading performance (SOTA), with an mAP@0.5 of 88.1%, further verifying its superiority on this dataset. These results highlight the model's broad applicability and generalization capabilities across various underwater datasets.
水下视觉检测技术对于海洋探索和监测至关重要。鉴于对精确水下目标识别的需求不断增长,本研究引入了一种创新架构YOLOv8-MU,它显著提高了检测精度。该模型整合了来自UniRepLKNet的大内核块(LarK块)以优化骨干网络,在不增加模型深度的情况下实现了更广泛的感受野。此外,将Swin变压器与C2f模块相结合的C2fSTR以及将跨阶段部分快速空间金字塔池化(SPPFCSPC)与注意力机制相融合的SPPFCSPC_EMA模块的集成,显著提高了对各种生物目标的检测精度和鲁棒性。来自DAMO-YOLO的融合块进一步增强了模型颈部的多尺度特征提取能力。此外,采用围绕顶点距离设计的MPDIoU损失函数有效地解决了水下生物检测中定位精度和边界清晰度的挑战。在URPC2019数据集上的实验结果表明,YOLOv8-MU的mAP@0.5达到78.4%,比原始YOLOv8模型提高了4.0%。此外,在URPC2020数据集上,它达到了80.9%,在水族馆数据集上,它达到了75.5%,超过了包括YOLOv5和YOLOv8n在内的其他模型,从而证实了我们提出的改进模型架构的广泛适用性和泛化能力。此外,在改进的URPC2019数据集上的评估显示出领先性能(SOTA),mAP@0.5为88.1%,进一步验证了其在该数据集上的优越性。这些结果突出了该模型在各种水下数据集上的广泛适用性和泛化能力。