IEEE Trans Med Imaging. 2022 Jul;41(7):1837-1848. doi: 10.1109/TMI.2022.3150682. Epub 2022 Jun 30.
Fully-supervised deep learning segmentation models are inflexible when encountering new unseen semantic classes and their fine-tuning often requires significant amounts of annotated data. Few-shot semantic segmentation (FSS) aims to solve this inflexibility by learning to segment an arbitrary unseen semantically meaningful class by referring to only a few labeled examples, without involving fine-tuning. State-of-the-art FSS methods are typically designed for segmenting natural images and rely on abundant annotated data of training classes to learn image representations that generalize well to unseen testing classes. However, such a training mechanism is impractical in annotation-scarce medical imaging scenarios. To address this challenge, in this work, we propose a novel self-supervised FSS framework for medical images, named SSL-ALPNet, in order to bypass the requirement for annotations during training. The proposed method exploits superpixel-based pseudo-labels to provide supervision signals. In addition, we propose a simple yet effective adaptive local prototype pooling module which is plugged into the prototype networks to further boost segmentation accuracy. We demonstrate the general applicability of the proposed approach using three different tasks: organ segmentation of abdominal CT and MRI images respectively, and cardiac segmentation of MRI images. The proposed method yields higher Dice scores than conventional FSS methods which require manual annotations for training in our experiments.
全监督深度学习分割模型在遇到新的未见语义类时不够灵活,其微调通常需要大量的标注数据。Few-shot 语义分割 (FSS) 的目标是通过仅参考少数几个有标签的示例来学习分割任意未见的语义有意义的类别,而无需涉及微调,从而解决这种灵活性问题。最先进的 FSS 方法通常用于分割自然图像,并依赖于大量的训练类别的注释数据来学习能够很好地泛化到未见测试类别的图像表示。然而,这种训练机制在注释稀缺的医学成像场景中是不切实际的。为了应对这一挑战,在这项工作中,我们提出了一种新颖的用于医学图像的自监督 FSS 框架,称为 SSL-ALPNet,以避免在训练期间进行注释。该方法利用基于超像素的伪标签提供监督信号。此外,我们提出了一种简单而有效的自适应局部原型池化模块,将其插入原型网络中以进一步提高分割准确性。我们使用三个不同的任务证明了所提出方法的通用性:分别是腹部 CT 和 MRI 图像的器官分割,以及 MRI 图像的心脏分割。在我们的实验中,所提出的方法在不需要手动注释进行训练的情况下,比传统的 FSS 方法获得了更高的 Dice 分数。