Ye Renchuan, Qian Yuqiang, Huang Xinming
Department of Electronic Information Engineering, School of Ocean Engineering, Jiangsu University of Science and Technology, Zhenjiang 212003, China.
Sensors (Basel). 2024 Sep 11;24(18):5893. doi: 10.3390/s24185893.
Recently, transformers have demonstrated notable improvements in natural advanced visual tasks. In the field of computer vision, transformer networks are beginning to supplant conventional convolutional neural networks (CNNs) due to their global receptive field and adaptability. Although transformers excel in capturing global features, they lag behind CNNs in handling fine local features, especially when dealing with underwater images containing complex and delicate structures. In order to tackle this challenge, we propose a refined transformer model by improving the feature blocks (dilated transformer block) to more accurately compute attention weights, enhancing the capture of both local and global features. Subsequently, a self-supervised method (a local and global blind-patch network) is embedded in the bottleneck layer, which can aggregate local and global information to enhance detail recovery and improve texture restoration quality. Additionally, we introduce a multi-scale convolutional block attention module (MSCBAM) to connect encoder and decoder features; this module enhances the feature representation of color channels, aiding in the restoration of color information in images. We plan to deploy this deep learning model onto the sensors of underwater robots for real-world underwater image-processing and ocean exploration tasks. Our model is named the refined transformer combined with convolutional block attention module (RT-CBAM). This study compares two traditional methods and six deep learning methods, and our approach achieved the best results in terms of detail processing and color restoration.
最近,变压器在自然高级视觉任务中展现出显著的改进。在计算机视觉领域,变压器网络因其全局感受野和适应性,开始取代传统的卷积神经网络(CNN)。尽管变压器在捕捉全局特征方面表现出色,但在处理精细的局部特征时却落后于CNN,尤其是在处理包含复杂精细结构的水下图像时。为了应对这一挑战,我们通过改进特征块(扩张变压器块)提出了一种改进的变压器模型,以更准确地计算注意力权重,增强对局部和全局特征的捕捉。随后,一种自监督方法(局部和全局盲补丁网络)被嵌入到瓶颈层,它可以聚合局部和全局信息,以增强细节恢复并提高纹理恢复质量。此外,我们引入了多尺度卷积块注意力模块(MSCBAM)来连接编码器和解码器特征;该模块增强了颜色通道的特征表示,有助于恢复图像中的颜色信息。我们计划将这个深度学习模型部署到水下机器人的传感器上,用于实际的水下图像处理和海洋探索任务。我们的模型被命名为结合卷积块注意力模块的改进变压器(RT-CBAM)。本研究比较了两种传统方法和六种深度学习方法,我们的方法在细节处理和颜色恢复方面取得了最佳结果。