Zheng Junze, Xiao Junyan, Wang Yaowei, Zhang Xuming
Department of Biomedical Engineering, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.
Sensors (Basel). 2024 May 30;24(11):3545. doi: 10.3390/s24113545.
Multi-modal medical image fusion (MMIF) is crucial for disease diagnosis and treatment because the images reconstructed from signals collected by different sensors can provide complementary information. In recent years, deep learning (DL) based methods have been widely used in MMIF. However, these methods often adopt a serial fusion strategy without feature decomposition, causing error accumulation and confusion of characteristics across different scales. To address these issues, we have proposed the Coupled Image Reconstruction and Fusion (CIRF) strategy. Our method parallels the image fusion and reconstruction branches which are linked by a common encoder. Firstly, CIRF uses the lightweight encoder to extract base and detail features, respectively, through the Vision Transformer (ViT) and the Convolutional Neural Network (CNN) branches, where the two branches interact to supplement information. Then, two types of features are fused separately via different blocks and finally decoded into fusion results. In the loss function, both the supervised loss from the reconstruction branch and the unsupervised loss from the fusion branch are included. As a whole, CIRF increases its expressivity by adding multi-task learning and feature decomposition. Additionally, we have also explored the impact of image masking on the network's feature extraction ability and validated the generalization capability of the model. Through experiments on three datasets, it has been demonstrated both subjectively and objectively, that the images fused by CIRF exhibit appropriate brightness and smooth edge transition with more competitive evaluation metrics than those fused by several other traditional and DL-based methods.
多模态医学图像融合(MMIF)对于疾病诊断和治疗至关重要,因为从不同传感器收集的信号重建的图像可以提供互补信息。近年来,基于深度学习(DL)的方法已广泛应用于MMIF。然而,这些方法通常采用无特征分解的串行融合策略,导致误差累积和不同尺度特征的混淆。为了解决这些问题,我们提出了耦合图像重建与融合(CIRF)策略。我们的方法将图像融合和重建分支并行,通过一个公共编码器连接。首先,CIRF使用轻量级编码器分别通过视觉Transformer(ViT)和卷积神经网络(CNN)分支提取基础特征和细节特征,两个分支相互作用以补充信息。然后,两种类型的特征分别通过不同的模块进行融合,最后解码为融合结果。在损失函数中,既包括来自重建分支的监督损失,也包括来自融合分支的无监督损失。总体而言,CIRF通过添加多任务学习和特征分解提高了其表达能力。此外,我们还探讨了图像掩码对网络特征提取能力的影响,并验证了模型的泛化能力。通过在三个数据集上的实验,主观和客观地证明了,与其他几种传统方法和基于DL的方法融合的图像相比,CIRF融合的图像具有适当的亮度和平滑的边缘过渡,评估指标更具竞争力。