CTFusion：基于卷积神经网络-Transformer的红外与可见光图像融合自监督学习

CTFusion: CNN-transformer-based self-supervised learning for infrared and visible image fusion.

作者信息

Du Keying, Fang Liuyang, Chen Jie, Chen Dongdong, Lai Hua

机构信息

Yunnan Key Laboratory of Digital Communications, Yunnan Communications Investment & Construction Group Company Limited, Kunming, China.

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China.

出版信息

Math Biosci Eng. 2024 Jul 30;21(7):6710-6730. doi: 10.3934/mbe.2024294.

DOI:10.3934/mbe.2024294

PMID:39176416

Abstract

Infrared and visible image fusion (IVIF) is devoted to extracting and integrating useful complementary information from muti-modal source images. Current fusion methods usually require a large number of paired images to train the models in supervised or unsupervised way. In this paper, we propose CTFusion, a convolutional neural network (CNN)-Transformer-based IVIF framework that uses self-supervised learning. The whole framework is based on an encoder-decoder network, where encoders are endowed with strong local and global dependency modeling ability via the CNN-Transformer-based feature extraction (CTFE) module design. Thanks to the development of self-supervised learning, the model training does not require ground truth fusion images with simple pretext task. We designed a mask reconstruction task according to the characteristics of IVIF, through which the network can learn the characteristics of both infrared and visible images and extract more generalized features. We evaluated our method and compared it to five competitive traditional and deep learning-based methods on three IVIF benchmark datasets. Extensive experimental results demonstrate that our CTFusion can achieve the best performance compared to the state-of-the-art methods in both subjective and objective evaluations.

摘要

红外与可见光图像融合（IVIF）致力于从多模态源图像中提取并整合有用的互补信息。当前的融合方法通常需要大量成对图像，以监督或无监督方式训练模型。在本文中，我们提出了CTFusion，这是一种基于卷积神经网络（CNN）-Transformer的IVIF框架，采用自监督学习。整个框架基于编码器-解码器网络，其中通过基于CNN-Transformer的特征提取（CTFE）模块设计，编码器被赋予了强大的局部和全局依赖性建模能力。得益于自监督学习的发展，模型训练不需要带有简单前置任务的真实融合图像。我们根据IVIF的特点设计了一个掩码重建任务，通过该任务网络可以学习红外和可见光图像的特征，并提取更具通用性的特征。我们在三个IVIF基准数据集上评估了我们的方法，并将其与五种具有竞争力的传统方法和基于深度学习的方法进行了比较。大量实验结果表明，在主观和客观评估中，与当前最先进的方法相比，我们的CTFusion都能取得最佳性能。