Liu Yu, Yu Chen, Cheng Juan, Wang Z Jane, Chen Xun
IEEE Trans Image Process. 2024;33:2197-2212. doi: 10.1109/TIP.2024.3374072. Epub 2024 Mar 25.
Anatomical and functional image fusion is an important technique in a variety of medical and biological applications. Recently, deep learning (DL)-based methods have become a mainstream direction in the field of multi-modal image fusion. However, existing DL-based fusion approaches have difficulty in effectively capturing local features and global contextual information simultaneously. In addition, the scale diversity of features, which is a crucial issue in image fusion, often lacks adequate attention in most existing works. In this paper, to address the above problems, we propose a MixFormer-based multi-scale network, termed as MM-Net, for anatomical and functional image fusion. In our method, an improved MixFormer-based backbone is introduced to sufficiently extract both local features and global contextual information at multiple scales from the source images. The features from different source images are fused at multiple scales based on a multi-source spatial attention-based cross-modality feature fusion (CMFF) module. The scale diversity of the fused features is further enriched by a series of multi-scale feature interaction (MSFI) modules and feature aggregation upsample (FAU) modules. Moreover, a loss function consisting of both spatial domain and frequency domain components is devised to train the proposed fusion model. Experimental results demonstrate that our method outperforms several state-of-the-art fusion methods on both qualitative and quantitative comparisons, and the proposed fusion model exhibits good generalization capability. The source code of our fusion method will be available at https://github.com/yuliu316316.
解剖学与功能图像融合是多种医学和生物学应用中的一项重要技术。近年来,基于深度学习(DL)的方法已成为多模态图像融合领域的主流方向。然而,现有的基于DL的融合方法难以同时有效地捕捉局部特征和全局上下文信息。此外,特征的尺度多样性是图像融合中的一个关键问题,但在大多数现有工作中往往缺乏足够的关注。在本文中,为了解决上述问题,我们提出了一种基于MixFormer的多尺度网络,称为MM-Net,用于解剖学与功能图像融合。在我们的方法中,引入了一种改进的基于MixFormer的主干网络,以从源图像中在多个尺度上充分提取局部特征和全局上下文信息。基于多源空间注意力的跨模态特征融合(CMFF)模块在多个尺度上融合来自不同源图像的特征。通过一系列多尺度特征交互(MSFI)模块和特征聚合上采样(FAU)模块进一步丰富融合特征的尺度多样性。此外,设计了一个由空间域和频域分量组成的损失函数来训练所提出的融合模型。实验结果表明,我们的方法在定性和定量比较中均优于几种现有的先进融合方法,并且所提出的融合模型具有良好的泛化能力。我们融合方法的源代码将在https://github.com/yuliu316316上提供。