SimpleClick：使用简单视觉Transformer的交互式图像分割

SimpleClick: Interactive Image Segmentation with Simple Vision Transformers.

作者信息

Liu Qin, Xu Zhenlin, Bertasius Gedas, Niethammer Marc

机构信息

University of North Carolina at Chapel Hill.

出版信息

Proc IEEE Int Conf Comput Vis. 2023 Oct;2023:22233-22243. doi: 10.1109/iccv51070.2023.02037.

DOI:10.1109/iccv51070.2023.02037

PMID:39247160

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11378330/

Abstract

Click-based interactive image segmentation aims at extracting objects with a limited user clicking. A hierarchical backbone is the architecture for current methods. Recently, the plain, non-hierarchical Vision Transformer (ViT) has emerged as a competitive backbone for dense prediction tasks. This design allows the original ViT to be a foundation model that can be finetuned for downstream tasks without redesigning a hierarchical backbone for pretraining. Although this design is simple and has been proven effective, it has not yet been explored for interactive image segmentation. To fill this gap, we propose SimpleClick, the first interactive segmentation method that leverages a plain backbone. Based on the plain backbone, we introduce a symmetric patch embedding layer that encodes clicks into the backbone with minor modifications to the backbone itself. With the plain backbone pretrained as a masked autoencoder (MAE), SimpleClick achieves state-of-the-art performance. Remarkably, our method achieves NoC@90 on SBD, improving over the previous best result. Extensive evaluation on medical images demonstrates the generalizability of our method. We provide a detailed computational analysis, highlighting the suitability of our method as a practical annotation tool.

摘要

基于点击的交互式图像分割旨在通过有限的用户点击来提取对象。分层主干是当前方法所采用的架构。最近，简单的、非分层的视觉Transformer（ViT）已成为密集预测任务的一种有竞争力的主干。这种设计使得原始的ViT成为一个基础模型，可针对下游任务进行微调，而无需为预训练重新设计分层主干。尽管这种设计简单且已被证明有效，但尚未在交互式图像分割中得到探索。为了填补这一空白，我们提出了SimpleClick，这是第一种利用简单主干的交互式分割方法。基于简单主干，我们引入了一个对称补丁嵌入层，通过对主干本身进行微小修改将点击编码到主干中。通过将简单主干预训练为掩码自动编码器（MAE），SimpleClick取得了领先的性能。值得注意的是，我们的方法在SBD上实现了90%的无点击准确率（NoC@90），比之前的最佳结果有所提高。对医学图像的广泛评估证明了我们方法的通用性。我们提供了详细的计算分析，突出了我们的方法作为一种实用注释工具的适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8218/11378330/7a6eaacaa849/nihms-2018064-f0008.jpg

相似文献

SimpleClick: Interactive Image Segmentation with Simple Vision Transformers.SimpleClick：使用简单视觉Transformer的交互式图像分割

Proc IEEE Int Conf Comput Vis. 2023 Oct;2023:22233-22243. doi: 10.1109/iccv51070.2023.02037.

Efficient click-based interactive segmentation for medical image with improved Plain-ViT.基于改进的Plain-ViT的医学图像高效点击式交互式分割

IEEE J Biomed Health Inform. 2024 Apr 24;PP. doi: 10.1109/JBHI.2024.3392893.

Skin Cancer Segmentation and Classification Using Vision Transformer for Automatic Analysis in Dermatoscopy-Based Noninvasive Digital System.基于皮肤镜的无创数字系统中使用视觉Transformer进行自动分析的皮肤癌分割与分类

Int J Biomed Imaging. 2024 Feb 3;2024:3022192. doi: 10.1155/2024/3022192. eCollection 2024.

SwinCross: Cross-modal Swin transformer for head-and-neck tumor segmentation in PET/CT images.SwinCross：用于 PET/CT 图像中头颈部肿瘤分割的跨模态 Swin 变换器。

Med Phys. 2024 Mar;51(3):2096-2107. doi: 10.1002/mp.16703. Epub 2023 Sep 30.

MAE-TransRNet: An improved transformer-ConvNet architecture with masked autoencoder for cardiac MRI registration.MAE-TransRNet：一种用于心脏磁共振成像配准的、带有掩码自动编码器的改进型Transformer-ConvNet架构。

Front Med (Lausanne). 2023 Mar 9;10:1114571. doi: 10.3389/fmed.2023.1114571. eCollection 2023.

PSAQ-ViT V2: Toward Accurate and General Data-Free Quantization for Vision Transformers.PSAQ-ViT V2：迈向用于视觉Transformer的准确且通用的无数据量化

IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17227-17238. doi: 10.1109/TNNLS.2023.3301007. Epub 2024 Dec 2.

AdaptiveClick: Click-Aware Transformer With Adaptive Focal Loss for Interactive Image Segmentation.自适应点击：用于交互式图像分割的具有自适应焦点损失的点击感知变换器

IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):5759-5773. doi: 10.1109/TNNLS.2024.3378295. Epub 2025 Feb 28.

Interactive segmentation of medical images using deep learning.基于深度学习的医学图像交互式分割。

Phys Med Biol. 2024 Feb 5;69(4). doi: 10.1088/1361-6560/ad1cf8.

MIDeepSeg: Minimally interactive segmentation of unseen objects from medical images using deep learning.MIDeepSeg：使用深度学习对医学图像中看不见的物体进行最少的交互分割。

Med Image Anal. 2021 Aug;72:102102. doi: 10.1016/j.media.2021.102102. Epub 2021 May 18.

3D face-model reconstruction from a single image: A feature aggregation approach using hierarchical transformer with weak supervision.基于分层 Transformer 的弱监督特征聚合方法的单幅图像 3D 人脸模型重建

Neural Netw. 2022 Dec;156:108-122. doi: 10.1016/j.neunet.2022.09.019. Epub 2022 Oct 1.

引用本文的文献

A narrative review of foundation models for medical image segmentation: zero-shot performance evaluation on diverse modalities.医学图像分割基础模型的叙述性综述：不同模态下的零样本性能评估

Quant Imaging Med Surg. 2025 Jun 6;15(6):5825-5858. doi: 10.21037/qims-2024-2826. Epub 2025 Jun 3.

Multi-scheme cross-level attention embedded U-shape transformer for MRI semantic segmentation.用于磁共振成像语义分割的多方案跨层注意力嵌入U型变换器

Sci Rep. 2025 Jul 2;15(1):22891. doi: 10.1038/s41598-025-06966-y.

PRISM: A Promptable and Robust Interactive Segmentation Model with Visual Prompts.PRISM：一种具有视觉提示的可提示且强大的交互式分割模型。

Med Image Comput Comput Assist Interv. 2024 Oct;15003:389-399. doi: 10.1007/978-3-031-72384-1_37. Epub 2024 Oct 3.

MiMICRI: Towards Domain-centered Counterfactual Explanations of Cardiovascular Image Classification Models.MiMICRI：迈向心血管图像分类模型以领域为中心的反事实解释

FACCT 24 (2024). 2024 Jun;2024:1861-1874. doi: 10.1145/3630106.3659011. Epub 2024 Jun 5.

A Comprehensive Survey of Deep Learning Approaches in Image Processing.图像处理中深度学习方法的全面综述。

Sensors (Basel). 2025 Jan 17;25(2):531. doi: 10.3390/s25020531.

Penguin colony georegistration using camera pose estimation and phototourism.使用相机位姿估计和摄影旅游对企鹅群体进行地理注册。

PLoS One. 2024 Oct 30;19(10):e0311038. doi: 10.1371/journal.pone.0311038. eCollection 2024.

Pixel Diffuser: Practical Interactive Medical Image Segmentation without Ground Truth.像素扩散器：无需真实标注的实用交互式医学图像分割

Bioengineering (Basel). 2023 Nov 2;10(11):1280. doi: 10.3390/bioengineering10111280.

Three-dimensional surface motion capture of multiple freely moving pigs using MAMMAL.使用 MAMMAL 对多只自由移动的猪进行三维表面运动捕捉。

Nat Commun. 2023 Nov 25;14(1):7727. doi: 10.1038/s41467-023-43483-w.

Segment anything model for medical image analysis: An experimental study.用于医学图像分析的分割模型：一项实验研究。

Med Image Anal. 2023 Oct;89:102918. doi: 10.1016/j.media.2023.102918. Epub 2023 Aug 2.

本文引用的文献

Interactive Object Segmentation With Inside-Outside Guidance.基于内外引导的交互式目标分割。

IEEE Trans Pattern Anal Mach Intell. 2023 Jul;45(7):8594-8605. doi: 10.1109/TPAMI.2022.3227116. Epub 2023 Jun 5.

Automated segmentation of knee bone and cartilage combining statistical shape knowledge and convolutional neural networks: Data from the Osteoarthritis Initiative.基于统计形状知识和卷积神经网络的膝关节骨和软骨自动分割：来自 Osteoarthritis Initiative 的数据。

Med Image Anal. 2019 Feb;52:109-118. doi: 10.1016/j.media.2018.11.009. Epub 2018 Nov 17.

A survey on deep learning in medical image analysis.深度学习在医学图像分析中的应用研究综述。

Med Image Anal. 2017 Dec;42:60-88. doi: 10.1016/j.media.2017.07.005. Epub 2017 Jul 26.

Deep Learning in Medical Image Analysis.医学图像分析中的深度学习

Annu Rev Biomed Eng. 2017 Jun 21;19:221-248. doi: 10.1146/annurev-bioeng-071516-044442. Epub 2017 Mar 9.

Random walks for image segmentation.用于图像分割的随机游走算法

IEEE Trans Pattern Anal Mach Intell. 2006 Nov;28(11):1768-83. doi: 10.1109/TPAMI.2006.233.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验