基于特征金字塔注意力 U-Net 的前视声纳图像语义分割

Feature Pyramid U-Net with Attention for Semantic Segmentation of Forward-Looking Sonar Images.

机构信息

The College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China.

The College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China.

出版信息

Sensors (Basel). 2022 Nov 3;22(21):8468. doi: 10.3390/s22218468.

DOI:10.3390/s22218468

PMID:36366169

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9653894/

Abstract

Forward-looking sonar is a technique widely used for underwater detection. However, most sonar images have underwater noise and low resolution due to their acoustic properties. In recent years, the semantic segmentation model U-Net has shown excellent segmentation performance, and it has great potential in forward-looking sonar image segmentation. However, forward-looking sonar images are affected by noise, which prevents the existing U-Net model from segmenting small objects effectively. Therefore, this study presents a forward-looking sonar semantic segmentation model called Feature Pyramid U-Net with Attention (FPUA). This model uses residual blocks to improve the training depth of the network. To improve the segmentation accuracy of the network for small objects, a feature pyramid module combined with an attention structure is introduced. This improves the model's ability to learn deep semantic and shallow detail information. First, the proposed model is compared against other deep learning models and on two datasets, of which one was collected in a tank environment and the other was collected in a real marine environment. To further test the validity of the model, a real forward-looking sonar system was devised and employed in the lake trials. The results show that the proposed model performs better than the other models for small-object and few-sample classes and that it is competitive in semantic segmentation of forward-looking sonar images.

摘要

前视声纳是一种广泛应用于水下探测的技术。然而，由于其声学特性，大多数声纳图像都存在水下噪声和低分辨率。近年来，语义分割模型 U-Net 表现出了优异的分割性能，它在前视声纳图像分割中有很大的潜力。然而，前视声纳图像受到噪声的影响，这使得现有的 U-Net 模型无法有效地分割小目标。因此，本研究提出了一种名为特征金字塔 U-Net 与注意力（FPUA）的前视声纳语义分割模型。该模型使用残差块来提高网络的训练深度。为了提高网络对小目标的分割精度，引入了一个特征金字塔模块和一个注意力结构。这提高了模型学习深层语义和浅层细节信息的能力。首先，将所提出的模型与其他深度学习模型进行了比较，并在两个数据集上进行了测试，其中一个数据集是在水箱环境中收集的，另一个是在真实的海洋环境中收集的。为了进一步验证模型的有效性，设计并在湖泊试验中使用了一个真实的前视声纳系统。结果表明，所提出的模型在小目标和少样本类别的性能优于其他模型，在前视声纳图像的语义分割方面具有竞争力。