Deng Xin, Liu Enpeng, Li Shengxi, Duan Yiping, Xu Mai
IEEE Trans Image Process. 2023;32:1078-1091. doi: 10.1109/TIP.2023.3240024. Epub 2023 Feb 7.
Multi-modal image registration aims to spatially align two images from different modalities to make their feature points match with each other. Captured by different sensors, the images from different modalities often contain many distinct features, which makes it challenging to find their accurate correspondences. With the success of deep learning, many deep networks have been proposed to align multi-modal images, however, they are mostly lack of interpretability. In this paper, we first model the multi-modal image registration problem as a disentangled convolutional sparse coding (DCSC) model. In this model, the multi-modal features that are responsible for alignment (RA features) are well separated from the features that are not responsible for alignment (nRA features). By only allowing the RA features to participate in the deformation field prediction, we can eliminate the interference of the nRA features to improve the registration accuracy and efficiency. The optimization process of the DCSC model to separate the RA and nRA features is then turned into a deep network, namely Interpretable Multi-modal Image Registration Network (InMIR-Net). To ensure the accurate separation of RA and nRA features, we further design an accompanying guidance network (AG-Net) to supervise the extraction of RA features in InMIR-Net. The advantage of InMIR-Net is that it provides a universal framework to tackle both rigid and non-rigid multi-modal image registration tasks. Extensive experimental results verify the effectiveness of our method on both rigid and non-rigid registrations on various multi-modal image datasets, including RGB/depth images, RGB/near-infrared (NIR) images, RGB/multi-spectral images, T1/T2 weighted magnetic resonance (MR) images and computed tomography (CT)/MR images. The codes are available at https://github.com/lep990816/Interpretable-Multi-modal-Image-Registration.
多模态图像配准旨在将来自不同模态的两幅图像在空间上对齐,以使它们的特征点相互匹配。由不同传感器捕获的不同模态图像通常包含许多不同的特征,这使得找到它们的准确对应关系具有挑战性。随着深度学习的成功,已经提出了许多深度网络来对齐多模态图像,然而,它们大多缺乏可解释性。在本文中,我们首先将多模态图像配准问题建模为解缠卷积稀疏编码(DCSC)模型。在这个模型中,负责对齐的多模态特征(RA特征)与不负责对齐的特征(nRA特征)被很好地分离。通过仅允许RA特征参与变形场预测,我们可以消除nRA特征的干扰,以提高配准精度和效率。然后,将用于分离RA和nRA特征的DCSC模型的优化过程转化为一个深度网络,即可解释多模态图像配准网络(InMIR-Net)。为了确保RA和nRA特征的准确分离,我们进一步设计了一个伴随引导网络(AG-Net)来监督InMIR-Net中RA特征的提取。InMIR-Net的优点是它提供了一个通用框架来处理刚性和非刚性多模态图像配准任务。大量实验结果验证了我们的方法在各种多模态图像数据集上的刚性和非刚性配准中的有效性,包括RGB/深度图像、RGB/近红外(NIR)图像、RGB/多光谱图像、T1/T2加权磁共振(MR)图像和计算机断层扫描(CT)/MR图像。代码可在https://github.com/lep990816/Interpretable-Multi-modal-Image-Registration获取。