Chen Wenguang, Wang Xiao, Chen Junjie, Sun Jialong, Zha Guozhen
Jiangsu Sanheng Technology Co. Ltd., Changzhou, China.
School of Marine Technology and Geomatics, Jiangsu Ocean University, Lianyungang, China.
PeerJ Comput Sci. 2025 May 29;11:e2923. doi: 10.7717/peerj-cs.2923. eCollection 2025.
Multibeam bathymetry has become an effective underwater target detection method by using echo signals to generate a high-resolution water column image (WCI). However, the gas plume in the image is often affected by the seafloor environment and exhibits sparse texture and changing motion, making traditional detection and segmentation methods more time-consuming and labor-intensive. The emergence of convolutional neural networks (CNNs) alleviates this problem, but the local feature extraction of the convolutional operations, while capturing detailed information well, cannot adapt to the elongated morphology of the gas plume target, limiting the improvement of the detection and segmentation accuracy. Inspired by the transformer's ability to achieve global modeling through self-attention, we combine CNN with the transformer to improve the existing YOLOv7 (You Only Look Once version 7) model. First, we sequentially reduce the ELAN (Efficient Layer Aggregation Networks) structure in the backbone network and verify that using the enhanced feature extraction module only in the deep network is more effective in recognising the gas plume targets. Then, the C-BiFormer module is proposed, which can achieve effective collaboration between local feature extraction and global semantic modeling while reducing computing resources, and enhance the multi-scale feature extraction capability of the model. Finally, two different depths of networks are designed by stacking C-BiFormer modules with different numbers of layers. This improves the receptive field so that the model's detection and segmentation accuracy achieve different levels of improvement. Experimental results show that the improved model is smaller in size and more accurate compared to the baseline.
多波束测深通过利用回波信号生成高分辨率水柱图像(WCI),已成为一种有效的水下目标检测方法。然而,图像中的气体羽流常常受到海底环境的影响,呈现出稀疏的纹理和变化的运动,这使得传统的检测和分割方法更加耗时且费力。卷积神经网络(CNN)的出现缓解了这一问题,但其卷积操作的局部特征提取虽然能很好地捕捉详细信息,却无法适应气体羽流目标的细长形态,限制了检测和分割精度的提高。受Transformer通过自注意力实现全局建模能力的启发,我们将CNN与Transformer相结合,对现有的YOLOv7(You Only Look Once版本7)模型进行改进。首先,我们依次减少骨干网络中的ELAN(高效层聚合网络)结构,并验证仅在深度网络中使用增强特征提取模块在识别气体羽流目标方面更有效。然后,提出了C-BiFormer模块,它能在减少计算资源的同时实现局部特征提取与全局语义建模之间的有效协作,并增强模型的多尺度特征提取能力。最后,通过堆叠不同层数的C-BiFormer模块设计了两种不同深度的网络。这扩大了感受野,使模型的检测和分割精度得到不同程度的提高。实验结果表明,与基线相比,改进后的模型尺寸更小且更准确。