Lin Xian, Xiang Yangyang, Wang Zhehao, Cheng Kwang-Ting, Yan Zengqiang, Yu Li
IEEE Trans Med Imaging. 2025 Mar;44(3):1386-1399. doi: 10.1109/TMI.2024.3493456. Epub 2025 Mar 17.
Segment anything model (SAM), a foundation model with superior versatility and generalization across diverse segmentation tasks, has attracted widespread attention in medical imaging. However, it has been proved that SAM would encounter severe performance degradation due to the lack of medical knowledge in training and local feature encoding. Though several SAM-based models have been proposed for tuning SAM in medical imaging, they still suffer from insufficient feature extraction and highly rely on high-quality prompts. In this paper, we propose a powerful foundation model SAMCT allowing labor-free prompts and train it on a collected large CT dataset consisting of 1.1M CT images and 5M masks from public datasets. Specifically, based on SAM, SAMCT is further equipped with a U-shaped CNN image encoder, a cross-branch interaction module, and a task-indicator prompt encoder. The U-shaped CNN image encoder works in parallel with the ViT image encoder in SAM to supplement local features. Cross-branch interaction enhances the feature expression capability of the CNN image encoder and the ViT image encoder by exchanging global perception and local features from one to the other. The task-indicator prompt encoder is a plug-and-play component to effortlessly encode task-related indicators into prompt embeddings. In this way, SAMCT can work in an automatic manner in addition to the semi-automatic interactive strategy in SAM. Extensive experiments demonstrate the superiority of SAMCT against the state-of-the-art task-specific and SAM-based medical foundation models on various tasks. The code, data, and model checkpoints are available at https://github.com/xianlin7/SAMCT.
分割一切模型(SAM)是一种在各种分割任务中具有卓越通用性和泛化能力的基础模型,在医学成像领域引起了广泛关注。然而,事实证明,由于训练中缺乏医学知识以及局部特征编码问题,SAM会遭遇严重的性能下降。尽管已经提出了几种基于SAM的模型用于医学成像中对SAM的调整,但它们仍然存在特征提取不足的问题,并且高度依赖高质量的提示信息。在本文中,我们提出了一个强大的基础模型SAMCT,它无需人工提示,并在一个收集的大型CT数据集上进行训练,该数据集由来自公共数据集的110万张CT图像和500万个掩码组成。具体而言,基于SAM,SAMCT进一步配备了一个U形卷积神经网络图像编码器、一个跨分支交互模块和一个任务指示提示编码器。U形卷积神经网络图像编码器与SAM中的视觉Transformer(ViT)图像编码器并行工作,以补充局部特征。跨分支交互通过相互交换全局感知和局部特征来增强卷积神经网络图像编码器和ViT图像编码器的特征表达能力。任务指示提示编码器是一个即插即用的组件,可轻松地将与任务相关的指标编码为提示嵌入。通过这种方式,SAMCT除了可以采用SAM中的半自动交互策略外,还能以自动方式工作。大量实验证明了SAMCT在各种任务上相对于当前最先进的特定任务和基于SAM的医学基础模型的优越性。代码、数据和模型检查点可在https://github.com/xianlin7/SAMCT获取。