粗掩码引导的交互式目标分割

Coarse Mask Guided Interactive Object Segmentation.

作者信息

Li Jing, Fan Junsong, Wang Yuxi, Yang Yuran, Zhang Zhaoxiang

出版信息

IEEE Trans Image Process. 2023;32:5808-5822. doi: 10.1109/TIP.2023.3322564. Epub 2023 Oct 26.

DOI:10.1109/TIP.2023.3322564

Abstract

Interactive object segmentation aims to produce object masks with user interactions, such as clicks, bounding boxes, and scribbles. Click point is the most popular interactive cue for its efficiency, and related deep learning methods have attracted lots of interest in recent years. Most works encode click points as gaussian maps and concatenate them with images as the model's input. However, the spatial and semantic information of gaussian maps would be noised through multiple convolution layers and won't be fully exploited by top layers for mask prediction. To pass click information to top layers exactly and efficiently, we propose a coarse mask guided model (CMG) which predicts coarse masks with a coarse module to guide the object mask prediction. Specifically, the coarse module encodes user clicks as query features and enriches their semantic information with backbone features through transformer layers, coarse masks are generated based on the enriched query feature and fed into CMG's decoder. Benefiting from the efficiency of transformer, CMG's coarse module and decoder module are lightweight and computationally efficient, making the interaction process more smooth. Experiments on several segmentation benchmarks demonstrate the effectiveness of our method, and we get new state-of-the-art results compared with previous works.

摘要

交互式目标分割旨在通过用户交互（如点击、边界框和涂鸦）生成目标掩码。点击点因其效率而成为最流行的交互线索，近年来相关的深度学习方法引起了广泛关注。大多数工作将点击点编码为高斯图，并将其与图像连接作为模型的输入。然而，高斯图的空间和语义信息会在多个卷积层中被噪声干扰，并且顶层不会充分利用这些信息进行掩码预测。为了准确且高效地将点击信息传递到顶层，我们提出了一种粗掩码引导模型（CMG），该模型使用一个粗模块预测粗掩码来指导目标掩码预测。具体来说，粗模块将用户点击编码为查询特征，并通过Transformer层利用主干特征丰富其语义信息，基于丰富后的查询特征生成粗掩码并输入到CMG的解码器中。受益于Transformer的效率，CMG的粗模块和解码器模块轻量级且计算高效，使得交互过程更加流畅。在多个分割基准上的实验证明了我们方法的有效性，与之前的工作相比，我们取得了新的最优结果。

相似文献

Coarse Mask Guided Interactive Object Segmentation.粗掩码引导的交互式目标分割

IEEE Trans Image Process. 2023;32:5808-5822. doi: 10.1109/TIP.2023.3322564. Epub 2023 Oct 26.

One-Click-Based Perception for Interactive Image Segmentation.基于一键式感知的交互式图像分割

IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):13975-13989. doi: 10.1109/TNNLS.2023.3274127. Epub 2024 Oct 7.

Coarse-to-fine prior-guided attention network for multi-structure segmentation on dental panoramic radiographs.基于粗到精先验引导注意力网络的口腔全景片多结构分割。

Phys Med Biol. 2023 Oct 26;68(21). doi: 10.1088/1361-6560/ad0218.

PIMedSeg: Progressive interactive medical image segmentation.PIMedSeg：渐进式交互式医学图像分割。

Comput Methods Programs Biomed. 2023 Nov;241:107776. doi: 10.1016/j.cmpb.2023.107776. Epub 2023 Aug 25.

Self Supervised Progressive Network for High Performance Video Object Segmentation.用于高性能视频对象分割的自监督渐进网络

IEEE Trans Neural Netw Learn Syst. 2024 Jun;35(6):7671-7684. doi: 10.1109/TNNLS.2022.3219936. Epub 2024 Jun 3.

A Holistically-Guided Decoder for Deep Representation Learning With Applications to Semantic Segmentation and Object Detection.一种用于深度表示学习的整体引导解码器及其在语义分割和目标检测中的应用

IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):11390-11406. doi: 10.1109/TPAMI.2021.3114342. Epub 2023 Sep 5.

Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation.双边注意解码器：用于实时语义分割的轻量级解码器。

Neural Netw. 2021 May;137:188-199. doi: 10.1016/j.neunet.2021.01.021. Epub 2021 Jan 30.

VLT: Vision-Language Transformer and Query Generation for Referring Segmentation.VLT：用于指代分割的视觉-语言转换器和查询生成。

IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7900-7916. doi: 10.1109/TPAMI.2022.3217852. Epub 2023 May 5.

Click to Correction: Interactive Bidirectional Dynamic Propagation Video Object Segmentation Network.点击修正：交互式双向动态传播视频对象分割网络。

Sensors (Basel). 2024 Oct 2;24(19):6405. doi: 10.3390/s24196405.

Coarse-to-Fine Semantic Segmentation From Image-Level Labels.从图像级标签进行粗到细的语义分割。

IEEE Trans Image Process. 2020;29:225-236. doi: 10.1109/TIP.2019.2926748. Epub 2019 Jul 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

粗掩码引导的交互式目标分割

Coarse Mask Guided Interactive Object Segmentation.

作者信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献