基于超分辨率的语义感知融合网络

Semantic-Aware Fusion Network Based on Super-Resolution.

作者信息

Xu Lingfeng, Zou Qiang

机构信息

School of Microelectronics, Tianjin University, Tianjin 300072, China.

Tianjin International Joint Research Center for Internet of Things, Tianjin 300072, China.

出版信息

Sensors (Basel). 2024 Jun 5;24(11):3665. doi: 10.3390/s24113665.

DOI:10.3390/s24113665

PMID:38894455

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11175180/

Abstract

The aim of infrared and visible image fusion is to generate a fused image that not only contains salient targets and rich texture details, but also facilitates high-level vision tasks. However, due to the hardware limitations of digital cameras and other devices, there are more low-resolution images in the existing datasets, and low-resolution images are often accompanied by the problem of losing details and structural information. At the same time, existing fusion algorithms focus too much on the visual quality of the fused images, while ignoring the requirements of high-level vision tasks. To address the above challenges, in this paper, we skillfully unite the super-resolution network, fusion network and segmentation network, and propose a super-resolution-based semantic-aware fusion network. First, we design a super-resolution network based on a multi-branch hybrid attention module (MHAM), which aims to enhance the quality and details of the source image, enabling the fusion network to integrate the features of the source image more accurately. Then, a comprehensive information extraction module (STDC) is designed in the fusion network to enhance the network's ability to extract finer-grained complementary information from the source image. Finally, the fusion network and segmentation network are jointly trained to utilize semantic loss to guide the semantic information back to the fusion network, which effectively improves the performance of the fused images on high-level vision tasks. Extensive experiments show that our method is more effective than other state-of-the-art image fusion methods. In particular, our fused images not only have excellent visual perception effects, but also help to improve the performance of high-level vision tasks.

摘要

红外与可见光图像融合的目的是生成一幅融合图像，该图像不仅包含显著目标和丰富的纹理细节，还便于进行高级视觉任务。然而，由于数码相机和其他设备的硬件限制，现有数据集中存在更多低分辨率图像，并且低分辨率图像常常伴随着细节和结构信息丢失的问题。同时，现有的融合算法过于关注融合图像的视觉质量，而忽略了高级视觉任务的要求。为应对上述挑战，在本文中，我们巧妙地将超分辨率网络、融合网络和分割网络结合起来，提出了一种基于超分辨率的语义感知融合网络。首先，我们设计了一种基于多分支混合注意力模块（MHAM）的超分辨率网络，其目的是提高源图像的质量和细节，使融合网络能够更准确地整合源图像的特征。然后，在融合网络中设计了一个综合信息提取模块（STDC），以增强网络从源图像中提取更细粒度互补信息的能力。最后，对融合网络和分割网络进行联合训练，利用语义损失将语义信息反馈到融合网络，这有效地提高了融合图像在高级视觉任务上的性能。大量实验表明，我们的方法比其他现有最先进的图像融合方法更有效。特别是，我们的融合图像不仅具有出色的视觉感知效果，还有助于提高高级视觉任务的性能。