Zhang Xuhui, Yin Yunpeng, Wang Zhuowei, Wu Heng, Cheng Lianglun, Yang Aimin, Zhao Genping
Guangdong Provincial Key Laboratory of Cyber-Physical System, Guangdong University of Technology, Guangzhou 510006, China.
School of Computer, Guangdong University of Technology, Guangzhou 510006, China.
Sensors (Basel). 2025 Apr 22;25(9):2646. doi: 10.3390/s25092646.
The fusion of infrared and visible images provides complementary information from both modalities and has been widely used in surveillance, military, and other fields. However, most of the available fusion methods have only been evaluated with subjective metrics of visual quality of the fused images, which are often independent of the following relevant high-level visual tasks. Moreover, as a useful technique especially used in low-light scenarios, the effect of low-light conditions on the fusion result has not been well-addressed yet. To address these challenges, a decoupled and semantic segmentation-driven infrared and visible image fusion network is proposed in this paper, which connects both image fusion and the downstream task to drive the network to be optimized. Firstly, a cross-modality transformer fusion module is designed to learn rich hierarchical feature representations. Secondly, a semantic-driven fusion module is developed to enhance the key features of prominent targets. Thirdly, a weighted fusion strategy is adopted to automatically adjust the fusion weights of different modality features. This effectively merges the thermal characteristics from infrared images and detailed information from visible images. Additionally, we design a refined loss function that employs the decoupling network to constrain the pixel distributions in the fused images and produce more-natural fusion images. To evaluate the robustness and generalization of the proposed method in practical challenge applications, a Maritime Infrared and Visible (MIV) dataset is created and verified for maritime environmental perception, which will be made available soon. The experimental results from both widely used public datasets and the practically collected MIV dataset highlight the notable strengths of the proposed method with the best-ranking quality metrics among its counterparts. Of more importance, the fusion image achieved with the proposed method has over 96% target detection accuracy and a dominant high mAP@[50:95] value that far surpasses all the competitors.
红外图像与可见光图像的融合提供了来自两种模态的互补信息,并已广泛应用于监控、军事和其他领域。然而,大多数现有的融合方法仅通过融合图像视觉质量的主观指标进行评估,这些指标通常与后续相关的高级视觉任务无关。此外,作为一种特别用于低光照场景的有用技术,低光照条件对融合结果的影响尚未得到很好的解决。为应对这些挑战,本文提出了一种解耦的、语义分割驱动的红外与可见光图像融合网络,该网络将图像融合与下游任务连接起来,以驱动网络进行优化。首先,设计了一个跨模态变压器融合模块来学习丰富的层次特征表示。其次,开发了一个语义驱动融合模块来增强突出目标的关键特征。第三,采用加权融合策略自动调整不同模态特征的融合权重。这有效地融合了红外图像的热特征和可见光图像的详细信息。此外,我们设计了一种改进的损失函数,该函数采用解耦网络来约束融合图像中的像素分布,并生成更自然的融合图像。为了评估所提方法在实际挑战性应用中的鲁棒性和泛化能力,创建了一个用于海上环境感知的海上红外与可见光(MIV)数据集并进行了验证,该数据集即将公开。来自广泛使用的公共数据集和实际采集的MIV数据集的实验结果突出了所提方法的显著优势,在同类方法中具有最佳的质量指标排名。更重要的是,所提方法获得的融合图像具有超过96%的目标检测准确率和占主导地位的高mAP@[50:95]值,远远超过所有竞争对手。