TOSQ：通过基于查询的字典查找和Transformer进行透明对象分割

TOSQ: Transparent Object Segmentation via Query-Based Dictionary Lookup with Transformers.

作者信息

Ma Bin, Ma Ming, Li Ruiguang, Zheng Jiawei, Li Deping

机构信息

College of Electronic and Information Engineering, Hubei University of Automatic Technology, No. 167 Checheng West Road, Shiyan 442002, China.

School of Intelligent Systems Science and Engineering, Jinan University, No. 206 Qianshan Road, Zhuhai 519070, China.

出版信息

Sensors (Basel). 2025 Jul 30;25(15):4700. doi: 10.3390/s25154700.

DOI:10.3390/s25154700

PMID:40807866

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12349598/

Abstract

Sensing transparent objects has many applications in human daily life, including robot navigation and grasping. However, this task presents significant challenges due to the unpredictable nature of scenes that extend beyond/behind transparent objects, particularly the lack of fixed visual patterns and strong background interference. This paper aims to solve the transparent object segmentation problem by leveraging the intrinsic global modeling capabilities of transformer architectures. We design a Query Parsing Module (QPM) that innovatively formulates segmentation as a dictionary lookup problem, differing fundamentally from conventional pixel-wise mechanisms, e.g., via attention-based prototype matching, and a set of learnable class prototypes as query inputs. Based on QPM, we propose a high-performance transformer-based end-to-end segmentation model, Transparent Object Segmentation through Query (TOSQ). TOSQ's encoder is based on the Segformer's backbone, and its decoder consists of a series of QPM modules, which progressively refine segmentation masks by the proposed QPMs. TOSQ achieves state-of-the-art performance on the Trans10K-V2 dataset (76.63% mIoU, 95.34% Acc), with particularly significant gains in challenging categories like windows (+23.59%) and glass doors (+11.22%), demonstrating its superior capability in transparent object segmentation.

摘要

感知透明物体在人类日常生活中有许多应用，包括机器人导航和抓取。然而，由于透明物体之外/后面场景的不可预测性，尤其是缺乏固定的视觉模式和强烈的背景干扰，这项任务面临着重大挑战。本文旨在通过利用Transformer架构的内在全局建模能力来解决透明物体分割问题。我们设计了一个查询解析模块（QPM），它创新性地将分割表述为一个字典查找问题，与传统的逐像素机制（例如基于注意力的原型匹配）有根本区别，并使用一组可学习的类原型作为查询输入。基于QPM，我们提出了一种基于Transformer的高性能端到端分割模型——通过查询进行透明物体分割（TOSQ）。TOSQ的编码器基于Segformer的主干，其解码器由一系列QPM模块组成，这些模块通过所提出的QPM逐步细化分割掩码。TOSQ在Trans10K-V2数据集上取得了领先的性能（平均交并比为76.63%，准确率为95.34%），在窗户（提高23.59%）和玻璃门（提高11.22%）等具有挑战性的类别中尤其有显著提升，证明了其在透明物体分割方面的卓越能力。