Suppr超能文献

SimpleClick:使用简单视觉Transformer的交互式图像分割

SimpleClick: Interactive Image Segmentation with Simple Vision Transformers.

作者信息

Liu Qin, Xu Zhenlin, Bertasius Gedas, Niethammer Marc

机构信息

University of North Carolina at Chapel Hill.

出版信息

Proc IEEE Int Conf Comput Vis. 2023 Oct;2023:22233-22243. doi: 10.1109/iccv51070.2023.02037.

Abstract

Click-based interactive image segmentation aims at extracting objects with a limited user clicking. A hierarchical backbone is the architecture for current methods. Recently, the plain, non-hierarchical Vision Transformer (ViT) has emerged as a competitive backbone for dense prediction tasks. This design allows the original ViT to be a foundation model that can be finetuned for downstream tasks without redesigning a hierarchical backbone for pretraining. Although this design is simple and has been proven effective, it has not yet been explored for interactive image segmentation. To fill this gap, we propose SimpleClick, the first interactive segmentation method that leverages a plain backbone. Based on the plain backbone, we introduce a symmetric patch embedding layer that encodes clicks into the backbone with minor modifications to the backbone itself. With the plain backbone pretrained as a masked autoencoder (MAE), SimpleClick achieves state-of-the-art performance. Remarkably, our method achieves NoC@90 on SBD, improving over the previous best result. Extensive evaluation on medical images demonstrates the generalizability of our method. We provide a detailed computational analysis, highlighting the suitability of our method as a practical annotation tool.

摘要

基于点击的交互式图像分割旨在通过有限的用户点击来提取对象。分层主干是当前方法所采用的架构。最近,简单的、非分层的视觉Transformer(ViT)已成为密集预测任务的一种有竞争力的主干。这种设计使得原始的ViT成为一个基础模型,可针对下游任务进行微调,而无需为预训练重新设计分层主干。尽管这种设计简单且已被证明有效,但尚未在交互式图像分割中得到探索。为了填补这一空白,我们提出了SimpleClick,这是第一种利用简单主干的交互式分割方法。基于简单主干,我们引入了一个对称补丁嵌入层,通过对主干本身进行微小修改将点击编码到主干中。通过将简单主干预训练为掩码自动编码器(MAE),SimpleClick取得了领先的性能。值得注意的是,我们的方法在SBD上实现了90%的无点击准确率(NoC@90),比之前的最佳结果有所提高。对医学图像的广泛评估证明了我们方法的通用性。我们提供了详细的计算分析,突出了我们的方法作为一种实用注释工具的适用性。

相似文献

6
PSAQ-ViT V2: Toward Accurate and General Data-Free Quantization for Vision Transformers.PSAQ-ViT V2:迈向用于视觉Transformer的准确且通用的无数据量化
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17227-17238. doi: 10.1109/TNNLS.2023.3301007. Epub 2024 Dec 2.

引用本文的文献

3
PRISM: A Promptable and Robust Interactive Segmentation Model with Visual Prompts.PRISM:一种具有视觉提示的可提示且强大的交互式分割模型。
Med Image Comput Comput Assist Interv. 2024 Oct;15003:389-399. doi: 10.1007/978-3-031-72384-1_37. Epub 2024 Oct 3.

本文引用的文献

1
Interactive Object Segmentation With Inside-Outside Guidance.基于内外引导的交互式目标分割。
IEEE Trans Pattern Anal Mach Intell. 2023 Jul;45(7):8594-8605. doi: 10.1109/TPAMI.2022.3227116. Epub 2023 Jun 5.
3
A survey on deep learning in medical image analysis.深度学习在医学图像分析中的应用研究综述。
Med Image Anal. 2017 Dec;42:60-88. doi: 10.1016/j.media.2017.07.005. Epub 2017 Jul 26.
4
Deep Learning in Medical Image Analysis.医学图像分析中的深度学习
Annu Rev Biomed Eng. 2017 Jun 21;19:221-248. doi: 10.1146/annurev-bioeng-071516-044442. Epub 2017 Mar 9.
5
Random walks for image segmentation.用于图像分割的随机游走算法
IEEE Trans Pattern Anal Mach Intell. 2006 Nov;28(11):1768-83. doi: 10.1109/TPAMI.2006.233.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验