IEEE Trans Image Process. 2022;31:5134-5149. doi: 10.1109/TIP.2022.3193288. Epub 2022 Aug 2.
Owing to the limitations of imaging sensors, it is challenging to obtain a medical image that simultaneously contains functional metabolic information and structural tissue details. Multimodal medical image fusion, an effective way to merge the complementary information in different modalities, has become a significant technique to facilitate clinical diagnosis and surgical navigation. With powerful feature representation ability, deep learning (DL)-based methods have improved such fusion results but still have not achieved satisfactory performance. Specifically, existing DL-based methods generally depend on convolutional operations, which can well extract local patterns but have limited capability in preserving global context information. To compensate for this defect and achieve accurate fusion, we propose a novel unsupervised method to fuse multimodal medical images via a multiscale adaptive Transformer termed MATR. In the proposed method, instead of directly employing vanilla convolution, we introduce an adaptive convolution for adaptively modulating the convolutional kernel based on the global complementary context. To further model long-range dependencies, an adaptive Transformer is employed to enhance the global semantic extraction capability. Our network architecture is designed in a multiscale fashion so that useful multimodal information can be adequately acquired from the perspective of different scales. Moreover, an objective function composed of a structural loss and a region mutual information loss is devised to construct constraints for information preservation at both the structural-level and the feature-level. Extensive experiments on a mainstream database demonstrate that the proposed method outperforms other representative and state-of-the-art methods in terms of both visual quality and quantitative evaluation. We also extend the proposed method to address other biomedical image fusion issues, and the pleasing fusion results illustrate that MATR has good generalization capability. The code of the proposed method is available at https://github.com/tthinking/MATR.
由于成像传感器的限制,很难获得同时包含功能代谢信息和结构组织细节的医学图像。多模态医学图像融合是一种融合不同模态互补信息的有效方法,已成为促进临床诊断和手术导航的重要技术。基于深度学习(DL)的方法具有强大的特征表示能力,提高了这种融合效果,但仍未达到令人满意的性能。具体来说,现有的基于 DL 的方法通常依赖于卷积操作,虽然卷积操作可以很好地提取局部模式,但在保留全局上下文信息方面能力有限。为了弥补这一缺陷并实现准确的融合,我们提出了一种新颖的基于多尺度自适应 Transformer 的无监督方法,称为 MATR,用于融合多模态医学图像。在提出的方法中,我们不是直接使用常规卷积,而是引入了自适应卷积,根据全局互补上下文自适应地调整卷积核。为了进一步建模远程依赖关系,引入自适应 Transformer 来增强全局语义提取能力。我们的网络架构采用多尺度设计,以便从不同尺度充分获取有用的多模态信息。此外,我们设计了一个由结构损失和区域互信息损失组成的目标函数,以在结构级和特征级构建信息保留的约束。在主流数据库上的广泛实验表明,与其他代表性和最先进的方法相比,所提出的方法在视觉质量和定量评估方面都表现出色。我们还将所提出的方法扩展到解决其他生物医学图像融合问题,令人满意的融合结果表明 MATR 具有良好的泛化能力。该方法的代码可在 https://github.com/tthinking/MATR 上获得。