混合掩模图像建模在 3D 医学图像分割中的应用。

Hybrid Masked Image Modeling for 3D Medical Image Segmentation.

出版信息

IEEE J Biomed Health Inform. 2024 Apr;28(4):2115-2125. doi: 10.1109/JBHI.2024.3360239. Epub 2024 Apr 4.

DOI:10.1109/JBHI.2024.3360239

Abstract

Masked image modeling (MIM) with transformer backbones has recently been exploited as a powerful self-supervised pre-training technique. The existing MIM methods adopt the strategy to mask random patches of the image and reconstruct the missing pixels, which only considers semantic information at a lower level, and causes a long pre-training time. This paper presents HybridMIM, a novel hybrid self-supervised learning method based on masked image modeling for 3D medical image segmentation. Specifically, we design a two-level masking hierarchy to specify which and how patches in sub-volumes are masked, effectively providing the constraints of higher level semantic information. Then we learn the semantic information of medical images at three levels, including: 1) partial region prediction to reconstruct key contents of the 3D image, which largely reduces the pre-training time burden (pixel-level); 2) patch-masking perception to learn the spatial relationship between the patches in each sub-volume (region-level); and 3) drop-out-based contrastive learning between samples within a mini-batch, which further improves the generalization ability of the framework (sample-level). The proposed framework is versatile to support both CNN and transformer as encoder backbones, and also enables to pre-train decoders for image segmentation. We conduct comprehensive experiments on five widely-used public medical image segmentation datasets, including BraTS2020, BTCV, MSD Liver, MSD Spleen, and BraTS2023. The experimental results show the clear superiority of HybridMIM against competing supervised methods, masked pre-training approaches, and other self-supervised methods, in terms of quantitative metrics, speed performance and qualitative observations.

摘要

基于 Transformer 骨干的掩码图像建模 (MIM) 最近被用作一种强大的自监督预训练技术。现有的 MIM 方法采用随机遮挡图像块并重建缺失像素的策略，仅考虑较低层次的语义信息，导致预训练时间较长。本文提出了一种新颖的基于掩码图像建模的混合自监督学习方法 HybridMIM，用于 3D 医学图像分割。具体来说，我们设计了两级掩蔽层次结构来指定子体积中的哪些和如何掩蔽块，有效地提供了更高层次语义信息的约束。然后，我们学习了医学图像的三个层次的语义信息，包括：1）部分区域预测，以重建 3D 图像的关键内容，大大减少了预训练时间负担（像素级）；2）块掩蔽感知，学习每个子体积中块之间的空间关系（区域级）；以及 3）基于样本内的dropout 的对比学习，进一步提高了框架的泛化能力（样本级）。所提出的框架具有通用性，支持 CNN 和 transformer 作为编码器骨干，并且还能够预训练用于图像分割的解码器。我们在五个广泛使用的公共医学图像分割数据集上进行了全面的实验，包括 BraTS2020、BTCV、MSD Liver、MSD Spleen 和 BraTS2023。实验结果表明，HybridMIM 在定量指标、速度性能和定性观察方面明显优于竞争的监督方法、掩码预训练方法和其他自监督方法。