将卷积神经网络与Transformer相结合以改进YOLOv7用于多波束水柱图像中的气体羽流检测与分割

Combining convolutional neural network with transformer to improve YOLOv7 for gas plume detection and segmentation in multibeam water column images.

作者信息

Chen Wenguang, Wang Xiao, Chen Junjie, Sun Jialong, Zha Guozhen

机构信息

Jiangsu Sanheng Technology Co. Ltd., Changzhou, China.

School of Marine Technology and Geomatics, Jiangsu Ocean University, Lianyungang, China.

出版信息

PeerJ Comput Sci. 2025 May 29;11:e2923. doi: 10.7717/peerj-cs.2923. eCollection 2025.

DOI:10.7717/peerj-cs.2923

PMID:40567798

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12192642/

Abstract

Multibeam bathymetry has become an effective underwater target detection method by using echo signals to generate a high-resolution water column image (WCI). However, the gas plume in the image is often affected by the seafloor environment and exhibits sparse texture and changing motion, making traditional detection and segmentation methods more time-consuming and labor-intensive. The emergence of convolutional neural networks (CNNs) alleviates this problem, but the local feature extraction of the convolutional operations, while capturing detailed information well, cannot adapt to the elongated morphology of the gas plume target, limiting the improvement of the detection and segmentation accuracy. Inspired by the transformer's ability to achieve global modeling through self-attention, we combine CNN with the transformer to improve the existing YOLOv7 (You Only Look Once version 7) model. First, we sequentially reduce the ELAN (Efficient Layer Aggregation Networks) structure in the backbone network and verify that using the enhanced feature extraction module only in the deep network is more effective in recognising the gas plume targets. Then, the C-BiFormer module is proposed, which can achieve effective collaboration between local feature extraction and global semantic modeling while reducing computing resources, and enhance the multi-scale feature extraction capability of the model. Finally, two different depths of networks are designed by stacking C-BiFormer modules with different numbers of layers. This improves the receptive field so that the model's detection and segmentation accuracy achieve different levels of improvement. Experimental results show that the improved model is smaller in size and more accurate compared to the baseline.

摘要

多波束测深通过利用回波信号生成高分辨率水柱图像（WCI），已成为一种有效的水下目标检测方法。然而，图像中的气体羽流常常受到海底环境的影响，呈现出稀疏的纹理和变化的运动，这使得传统的检测和分割方法更加耗时且费力。卷积神经网络（CNN）的出现缓解了这一问题，但其卷积操作的局部特征提取虽然能很好地捕捉详细信息，却无法适应气体羽流目标的细长形态，限制了检测和分割精度的提高。受Transformer通过自注意力实现全局建模能力的启发，我们将CNN与Transformer相结合，对现有的YOLOv7（You Only Look Once版本7）模型进行改进。首先，我们依次减少骨干网络中的ELAN（高效层聚合网络）结构，并验证仅在深度网络中使用增强特征提取模块在识别气体羽流目标方面更有效。然后，提出了C-BiFormer模块，它能在减少计算资源的同时实现局部特征提取与全局语义建模之间的有效协作，并增强模型的多尺度特征提取能力。最后，通过堆叠不同层数的C-BiFormer模块设计了两种不同深度的网络。这扩大了感受野，使模型的检测和分割精度得到不同程度的提高。实验结果表明，与基线相比，改进后的模型尺寸更小且更准确。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c72/12192642/87ff04ba8ed2/peerj-cs-11-2923-g001.jpg

相似文献

Combining convolutional neural network with transformer to improve YOLOv7 for gas plume detection and segmentation in multibeam water column images.将卷积神经网络与Transformer相结合以改进YOLOv7用于多波束水柱图像中的气体羽流检测与分割

PeerJ Comput Sci. 2025 May 29;11:e2923. doi: 10.7717/peerj-cs.2923. eCollection 2025.

DGCFNet: Dual Global Context Fusion Network for remote sensing image semantic segmentation.DGCFNet：用于遥感图像语义分割的双全局上下文融合网络

PeerJ Comput Sci. 2025 Mar 27;11:e2786. doi: 10.7717/peerj-cs.2786. eCollection 2025.

TLTNet: A novel transscale cascade layered transformer network for enhanced retinal blood vessel segmentation.TLTNet：一种新颖的跨尺度级联分层Transformer 网络，用于增强视网膜血管分割。

Comput Biol Med. 2024 Aug;178:108773. doi: 10.1016/j.compbiomed.2024.108773. Epub 2024 Jun 25.

A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。

Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.

BMCS-Net: A Bi-directional multi-scale cascaded segmentation network based on transformer-guided feature Aggregation for medical images.BMCS-Net：一种基于 Transformer 引导特征聚合的双向多尺度级联分割网络，用于医学图像。

Comput Biol Med. 2024 Sep;180:108939. doi: 10.1016/j.compbiomed.2024.108939. Epub 2024 Jul 29.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

CBAM VGG16: An efficient driver distraction classification using CBAM embedded VGG16 architecture.CBAM-VGG16：一种使用嵌入 CBAM 的 VGG16 架构的高效驾驶员分心分类方法。

Comput Biol Med. 2024 Sep;180:108945. doi: 10.1016/j.compbiomed.2024.108945. Epub 2024 Aug 1.

A 3D boundary-guided hybrid network with convolutions and Transformers for lung tumor segmentation in CT images.用于 CT 图像中肺肿瘤分割的三维边界引导卷积和 Transformer 混合网络。

Comput Biol Med. 2024 Sep;180:109009. doi: 10.1016/j.compbiomed.2024.109009. Epub 2024 Aug 12.

MACCoM: A multiple attention and convolutional cross-mixer framework for detailed 2D biomedical image segmentation.MACCoM：用于详细 2D 生物医学图像分割的多注意和卷积交叉混合器框架。

Comput Biol Med. 2024 Sep;179:108847. doi: 10.1016/j.compbiomed.2024.108847. Epub 2024 Jul 15.

ThreeF-Net: Fine-grained feature fusion network for breast ultrasound image segmentation.ThreeF-Net：用于乳腺超声图像分割的细粒度特征融合网络。

Comput Biol Med. 2025 Aug;194:110527. doi: 10.1016/j.compbiomed.2025.110527. Epub 2025 Jun 14.

本文引用的文献

Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition.Conv2Former：一种用于视觉识别的简单的类Transformer卷积网络。

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8274-8283. doi: 10.1109/TPAMI.2024.3401450. Epub 2024 Nov 6.

Contextual Transformer Networks for Visual Recognition.用于视觉识别的上下文Transformer网络

IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):1489-1500. doi: 10.1109/TPAMI.2022.3164083. Epub 2023 Jan 6.

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.DeepLab：基于深度卷积网络、空洞卷积和全连接条件随机场的语义图像分割。

IEEE Trans Pattern Anal Mach Intell. 2018 Apr;40(4):834-848. doi: 10.1109/TPAMI.2017.2699184. Epub 2017 Apr 27.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.更快的 R-CNN：基于区域建议网络的实时目标检测。

IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031. Epub 2016 Jun 6.

Fully Convolutional Networks for Semantic Segmentation.全卷积网络用于语义分割。

IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):640-651. doi: 10.1109/TPAMI.2016.2572683. Epub 2016 May 24.

A Robust and Fast Method for Sidescan Sonar Image Segmentation Using Nonlocal Despeckling and Active Contour Model.基于非局部去噪和活动轮廓模型的稳健快速侧扫声纳图像分割方法。

IEEE Trans Cybern. 2017 Apr;47(4):855-872. doi: 10.1109/TCYB.2016.2530786. Epub 2016 Mar 10.

Sonar image segmentation using an unsupervised hierarchical MRF model.使用无监督分层马尔可夫随机场模型的声纳图像分割

IEEE Trans Image Process. 2000;9(7):1216-31. doi: 10.1109/83.847834.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

将卷积神经网络与Transformer相结合以改进YOLOv7用于多波束水柱图像中的气体羽流检测与分割

Combining convolutional neural network with transformer to improve YOLOv7 for gas plume detection and segmentation in multibeam water column images.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献