Wahd Assefa S, Kupper Jessica, Jaremko Jacob L, Hareendranathan Abhilash R
Annu Int Conf IEEE Eng Med Biol Soc. 2024 Jul;2024:1-4. doi: 10.1109/EMBC53108.2024.10782494.
Segment Anything Model (SAM) is a foundation model that can be prompted with sparse prompts, like boxes or points, and dense prompts such as masks. SAM outputs binary masks based on the given prompts but lacks semantic understanding as it doesn't output the class of the predicted mask. We propose Semantic AutoSAM, a semantic segmentation model that builds upon SAM's binary segmentation. Semantic AutoSAM replaces SAM's manual prompt encoder with a lightweight cross-attention module, enabling it to predict prompt embeddings directly from the image features. This eliminates the need for manual prompting.In our experiments on the FLAIR 2022 dataset (20 CT scans) and a hip ultrasound dataset (4849 2D images), Semantic AutoSAM matches the performance of using groundtruth bounding box prompts for most organs. Our proposed method achieves a Dice score of 0.62 in the FLAIR dataset, and MobileSAM with groundtruth box achieves 0.7. In the hip ultrasound dataset, our approach achieves a Dice score of 0.83, surpassing MobileSAM's slightly lower score of 0.81 despite MobileSAM having access to the groundtruth box for prediction. Notably, our method doesn't require manual prompts at test time.
分割一切模型(SAM)是一种基础模型,可以通过稀疏提示(如框或点)以及密集提示(如掩码)进行提示。SAM根据给定提示输出二进制掩码,但由于它不输出预测掩码的类别,因此缺乏语义理解。我们提出了语义自动SAM,这是一种基于SAM的二进制分割构建的语义分割模型。语义自动SAM用一个轻量级交叉注意力模块取代了SAM的手动提示编码器,使其能够直接从图像特征预测提示嵌入。这消除了手动提示的需要。在我们对FLAIR 2022数据集(20次CT扫描)和髋关节超声数据集(4849张二维图像)的实验中,语义自动SAM在大多数器官上的表现与使用真实边界框提示的性能相匹配。我们提出的方法在FLAIR数据集中的Dice分数为0.62,使用真实框的移动SAM为0.7。在髋关节超声数据集中,我们的方法Dice分数为0.83,尽管移动SAM在预测时有真实框可用,但其分数略低,为0.81。值得注意的是,我们的方法在测试时不需要手动提示。