Suppr超能文献

基于Transformer的视觉分割:一项综述。

Transformer-Based Visual Segmentation: A Survey.

作者信息

Li Xiangtai, Ding Henghui, Yuan Haobo, Zhang Wenwei, Pang Jiangmiao, Cheng Guangliang, Chen Kai, Liu Ziwei, Loy Chen Change

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10138-10163. doi: 10.1109/TPAMI.2024.3434373. Epub 2024 Nov 6.

Abstract

Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks. Specifically, vision transformers offer robust, unified, and even simpler solutions for various segmentation tasks. This survey provides a thorough overview of transformer-based visual segmentation, summarizing recent advancements. We first review the background, encompassing problem definitions, datasets, and prior convolutional methods. Next, we summarize a meta-architecture that unifies all recent transformer-based approaches. Based on this meta-architecture, we examine various method designs, including modifications to the meta-architecture and associated applications. We also present several specific subfields, including 3D point cloud segmentation, foundation model tuning, domain-aware segmentation, efficient segmentation, and medical segmentation. Additionally, we compile and re-evaluate the reviewed methods on several well-established datasets. Finally, we identify open challenges in this field and propose directions for future research.

摘要

视觉分割旨在将图像、视频帧或点云分割成多个片段或组。这项技术在现实世界中有许多应用,如自动驾驶、图像编辑、机器人传感和医学分析。在过去十年中,基于深度学习的方法在这一领域取得了显著进展。最近,Transformer,一种最初为自然语言处理而设计的基于自注意力的神经网络,在各种视觉处理任务中已大大超越了先前的卷积或循环方法。具体而言,视觉Transformer为各种分割任务提供了强大、统一甚至更简单的解决方案。本综述全面概述了基于Transformer的视觉分割,总结了近期的进展。我们首先回顾背景,包括问题定义、数据集和先前的卷积方法。接下来,我们总结了一种统一所有近期基于Transformer方法的元架构。基于此元架构,我们研究了各种方法设计,包括对元架构的修改及相关应用。我们还介绍了几个特定的子领域,包括三维点云分割、基础模型调优、领域感知分割、高效分割和医学分割。此外,我们在几个成熟的数据集上对所综述的方法进行了编译和重新评估。最后,我们确定了该领域的开放挑战,并提出了未来研究的方向。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验