文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

SimpleClick:使用简单视觉Transformer的交互式图像分割

SimpleClick: Interactive Image Segmentation with Simple Vision Transformers.

作者信息

Liu Qin, Xu Zhenlin, Bertasius Gedas, Niethammer Marc

机构信息

University of North Carolina at Chapel Hill.

出版信息

Proc IEEE Int Conf Comput Vis. 2023 Oct;2023:22233-22243. doi: 10.1109/iccv51070.2023.02037.


DOI:10.1109/iccv51070.2023.02037
PMID:39247160
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11378330/
Abstract

Click-based interactive image segmentation aims at extracting objects with a limited user clicking. A hierarchical backbone is the architecture for current methods. Recently, the plain, non-hierarchical Vision Transformer (ViT) has emerged as a competitive backbone for dense prediction tasks. This design allows the original ViT to be a foundation model that can be finetuned for downstream tasks without redesigning a hierarchical backbone for pretraining. Although this design is simple and has been proven effective, it has not yet been explored for interactive image segmentation. To fill this gap, we propose SimpleClick, the first interactive segmentation method that leverages a plain backbone. Based on the plain backbone, we introduce a symmetric patch embedding layer that encodes clicks into the backbone with minor modifications to the backbone itself. With the plain backbone pretrained as a masked autoencoder (MAE), SimpleClick achieves state-of-the-art performance. Remarkably, our method achieves NoC@90 on SBD, improving over the previous best result. Extensive evaluation on medical images demonstrates the generalizability of our method. We provide a detailed computational analysis, highlighting the suitability of our method as a practical annotation tool.

摘要

基于点击的交互式图像分割旨在通过有限的用户点击来提取对象。分层主干是当前方法所采用的架构。最近,简单的、非分层的视觉Transformer(ViT)已成为密集预测任务的一种有竞争力的主干。这种设计使得原始的ViT成为一个基础模型,可针对下游任务进行微调,而无需为预训练重新设计分层主干。尽管这种设计简单且已被证明有效,但尚未在交互式图像分割中得到探索。为了填补这一空白,我们提出了SimpleClick,这是第一种利用简单主干的交互式分割方法。基于简单主干,我们引入了一个对称补丁嵌入层,通过对主干本身进行微小修改将点击编码到主干中。通过将简单主干预训练为掩码自动编码器(MAE),SimpleClick取得了领先的性能。值得注意的是,我们的方法在SBD上实现了90%的无点击准确率(NoC@90),比之前的最佳结果有所提高。对医学图像的广泛评估证明了我们方法的通用性。我们提供了详细的计算分析,突出了我们的方法作为一种实用注释工具的适用性。

相似文献

[1]
SimpleClick: Interactive Image Segmentation with Simple Vision Transformers.

Proc IEEE Int Conf Comput Vis. 2023-10

[2]
Efficient click-based interactive segmentation for medical image with improved Plain-ViT.

IEEE J Biomed Health Inform. 2024-4-24

[3]
Skin Cancer Segmentation and Classification Using Vision Transformer for Automatic Analysis in Dermatoscopy-Based Noninvasive Digital System.

Int J Biomed Imaging. 2024-2-3

[4]
SwinCross: Cross-modal Swin transformer for head-and-neck tumor segmentation in PET/CT images.

Med Phys. 2024-3

[5]
MAE-TransRNet: An improved transformer-ConvNet architecture with masked autoencoder for cardiac MRI registration.

Front Med (Lausanne). 2023-3-9

[6]
PSAQ-ViT V2: Toward Accurate and General Data-Free Quantization for Vision Transformers.

IEEE Trans Neural Netw Learn Syst. 2024-12

[7]
AdaptiveClick: Click-Aware Transformer With Adaptive Focal Loss for Interactive Image Segmentation.

IEEE Trans Neural Netw Learn Syst. 2025-3

[8]
Interactive segmentation of medical images using deep learning.

Phys Med Biol. 2024-2-5

[9]
MIDeepSeg: Minimally interactive segmentation of unseen objects from medical images using deep learning.

Med Image Anal. 2021-8

[10]
3D face-model reconstruction from a single image: A feature aggregation approach using hierarchical transformer with weak supervision.

Neural Netw. 2022-12

引用本文的文献

[1]
A narrative review of foundation models for medical image segmentation: zero-shot performance evaluation on diverse modalities.

Quant Imaging Med Surg. 2025-6-6

[2]
Multi-scheme cross-level attention embedded U-shape transformer for MRI semantic segmentation.

Sci Rep. 2025-7-2

[3]
PRISM: A Promptable and Robust Interactive Segmentation Model with Visual Prompts.

Med Image Comput Comput Assist Interv. 2024-10

[4]
MiMICRI: Towards Domain-centered Counterfactual Explanations of Cardiovascular Image Classification Models.

FACCT 24 (2024). 2024-6

[5]
A Comprehensive Survey of Deep Learning Approaches in Image Processing.

Sensors (Basel). 2025-1-17

[6]
Penguin colony georegistration using camera pose estimation and phototourism.

PLoS One. 2024

[7]
Pixel Diffuser: Practical Interactive Medical Image Segmentation without Ground Truth.

Bioengineering (Basel). 2023-11-2

[8]
Three-dimensional surface motion capture of multiple freely moving pigs using MAMMAL.

Nat Commun. 2023-11-25

[9]
Segment anything model for medical image analysis: An experimental study.

Med Image Anal. 2023-10

本文引用的文献

[1]
Interactive Object Segmentation With Inside-Outside Guidance.

IEEE Trans Pattern Anal Mach Intell. 2023-7

[2]
Automated segmentation of knee bone and cartilage combining statistical shape knowledge and convolutional neural networks: Data from the Osteoarthritis Initiative.

Med Image Anal. 2018-11-17

[3]
A survey on deep learning in medical image analysis.

Med Image Anal. 2017-7-26

[4]
Deep Learning in Medical Image Analysis.

Annu Rev Biomed Eng. 2017-6-21

[5]
Random walks for image segmentation.

IEEE Trans Pattern Anal Mach Intell. 2006-11

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索