Rao Dongyu, Xu Tianyang, Wu Xiao-Jun
IEEE Trans Image Process. 2023 May 10;PP. doi: 10.1109/TIP.2023.3273451.
The end-to-end image fusion framework has achieved promising performance, with dedicated convolutional networks aggregating the multi-modal local appearance. However, long-range dependencies are directly neglected in existing CNN fusion approaches, impeding balancing the entire image-level perception for complex scenario fusion. In this paper, therefore, we propose an infrared and visible image fusion algorithm based on the transformer module and adversarial learning. Inspired by the global interaction power, we use the transformer technique to learn the effective global fusion relations. In particular, shallow features extracted by CNN are interacted in the proposed transformer fusion module to refine the fusion relationship within the spatial scope and across channels simultaneously. Besides, adversarial learning is designed in the training process to improve the output discrimination via imposing competitive consistency from the inputs, reflecting the specific characteristics in infrared and visible images. The experimental performance demonstrates the effectiveness of the proposed modules, with superior improvement against the state-of-the-art, generalising a novel paradigm via transformer and adversarial learning in the fusion task.
端到端图像融合框架已经取得了不错的性能,通过专用卷积网络聚合多模态局部外观。然而,现有卷积神经网络(CNN)融合方法直接忽略了长距离依赖关系,这阻碍了在复杂场景融合中平衡整个图像级别的感知。因此,在本文中,我们提出了一种基于Transformer模块和对抗学习的红外与可见光图像融合算法。受全局交互能力的启发,我们使用Transformer技术来学习有效的全局融合关系。具体而言,由CNN提取的浅层特征在所提出的Transformer融合模块中进行交互,以同时在空间范围内和跨通道细化融合关系。此外,在训练过程中设计了对抗学习,通过从输入施加竞争一致性来提高输出的判别力,反映红外和可见光图像中的特定特征。实验性能证明了所提出模块的有效性,相对于现有技术有显著改进,在融合任务中通过Transformer和对抗学习推广了一种新的范式。