IEEE Trans Med Imaging. 2020 Jul;39(7):2415-2425. doi: 10.1109/TMI.2019.2963882. Epub 2020 Feb 3.
Multi-modal learning is typically performed with network architectures containing modality-specific layers and shared layers, utilizing co-registered images of different modalities. We propose a novel learning scheme for unpaired cross-modality image segmentation, with a highly compact architecture achieving superior segmentation accuracy. In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI, and only employ modality-specific internal normalization layers which compute respective statistics. To effectively train such a highly compact model, we introduce a novel loss term inspired by knowledge distillation, by explicitly constraining the KL-divergence of our derived prediction distributions between modalities. We have extensively validated our approach on two multi-class segmentation problems: i) cardiac structure segmentation, and ii) abdominal organ segmentation. Different network settings, i.e., 2D dilated network and 3D U-net, are utilized to investigate our method's general efficacy. Experimental results on both tasks demonstrate that our novel multi-modal learning scheme consistently outperforms single-modal training and previous multi-modal approaches.
多模态学习通常使用包含特定于模态的层和共享层的网络架构来执行,利用不同模态的配准图像。我们提出了一种新的用于非配对跨模态图像分割的学习方案,具有高度紧凑的架构,实现了更高的分割准确性。在我们的方法中,我们通过在 CT 和 MRI 之间共享所有卷积核,大量重用网络参数,并且只使用特定于模态的内部归一化层来计算各自的统计信息。为了有效地训练这种高度紧凑的模型,我们引入了一种新的基于知识蒸馏的损失项,通过明确约束模态之间我们推导的预测分布的 KL 散度。我们在两个多类分割问题上对我们的方法进行了广泛的验证:i)心脏结构分割,ii)腹部器官分割。使用不同的网络设置,即 2D 扩张网络和 3D U-Net,来研究我们方法的一般效果。在这两个任务上的实验结果表明,我们的新的多模态学习方案始终优于单模态训练和以前的多模态方法。