Center of Advanced Medical Computing and Analysis, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA.
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China.
Med Image Anal. 2024 Dec;98:103310. doi: 10.1016/j.media.2024.103310. Epub 2024 Aug 22.
The Segment Anything Model (SAM), a foundation model for general image segmentation, has demonstrated impressive zero-shot performance across numerous natural image segmentation tasks. However, SAM's performance significantly declines when applied to medical images, primarily due to the substantial disparity between natural and medical image domains. To effectively adapt SAM to medical images, it is important to incorporate critical third-dimensional information, i.e., volumetric or temporal knowledge, during fine-tuning. Simultaneously, we aim to harness SAM's pre-trained weights within its original 2D backbone to the fullest extent. In this paper, we introduce a modality-agnostic SAM adaptation framework, named as MA-SAM, that is applicable to various volumetric and video medical data. Our method roots in the parameter-efficient fine-tuning strategy to update only a small portion of weight increments while preserving the majority of SAM's pre-trained weights. By injecting a series of 3D adapters into the transformer blocks of the image encoder, our method enables the pre-trained 2D backbone to extract third-dimensional information from input data. We comprehensively evaluate our method on five medical image segmentation tasks, by using 11 public datasets across CT, MRI, and surgical video data. Remarkably, without using any prompt, our method consistently outperforms various state-of-the-art 3D approaches, surpassing nnU-Net by 0.9%, 2.6%, and 9.9% in Dice for CT multi-organ segmentation, MRI prostate segmentation, and surgical scene segmentation respectively. Our model also demonstrates strong generalization, and excels in challenging tumor segmentation when prompts are used. Our code is available at: https://github.com/cchen-cc/MA-SAM.
Segment Anything Model(SAM)是一种通用图像分割的基础模型,在众多自然图像分割任务中展示了令人印象深刻的零样本性能。然而,当应用于医学图像时,SAM 的性能会显著下降,主要是因为自然图像和医学图像领域之间存在巨大的差异。为了有效地将 SAM 应用于医学图像,在微调过程中纳入关键的三维信息(即体积或时间知识)非常重要。同时,我们旨在最大限度地利用 SAM 在其原始 2D 主干中的预训练权重。在本文中,我们引入了一种称为 MA-SAM 的与模态无关的 SAM 适应框架,适用于各种体积和视频医学数据。我们的方法基于参数有效的微调策略,只更新一小部分权重增量,同时保留 SAM 的大部分预训练权重。通过在图像编码器的变压器块中注入一系列 3D 适配器,我们的方法使预训练的 2D 主干能够从输入数据中提取三维信息。我们在五个医学图像分割任务上全面评估了我们的方法,使用了 11 个公共数据集,涵盖 CT、MRI 和手术视频数据。值得注意的是,在不使用任何提示的情况下,我们的方法始终优于各种最先进的 3D 方法,在 CT 多器官分割、MRI 前列腺分割和手术场景分割方面,我们的方法分别比 nnU-Net 高出 0.9%、2.6%和 9.9%。我们的模型还表现出很强的泛化能力,在使用提示时,在具有挑战性的肿瘤分割方面表现出色。我们的代码可在:https://github.com/cchen-cc/MA-SAM 获得。