基于辅助卷积层的 Mask Transformer 增强语义分割方法。

Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation.

机构信息

Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616, USA.

出版信息

Sensors (Basel). 2023 Jan 4;23(2):581. doi: 10.3390/s23020581.

DOI:10.3390/s23020581

PMID:36679377

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9867439/

Abstract

Transformer-based semantic segmentation methods have achieved excellent performance in recent years. Mask2Former is one of the well-known transformer-based methods which unifies common image segmentation into a universal model. However, it performs relatively poorly in obtaining local features and segmenting small objects due to relying heavily on transformers. To this end, we propose a simple yet effective architecture that introduces auxiliary branches to Mask2Former during training to capture dense local features on the encoder side. The obtained features help improve the performance of learning local information and segmenting small objects. Since the proposed auxiliary convolution layers are required only for training and can be removed during inference, the performance gain can be obtained without additional computation at inference. Experimental results show that our model can achieve state-of-the-art performance (57.6% mIoU) on the ADE20K and (84.8% mIoU) on the Cityscapes datasets.

摘要

基于 Transformer 的语义分割方法近年来取得了优异的性能。Mask2Former 是基于 Transformer 的知名方法之一，它将常见的图像分割统一为一个通用模型。然而，由于严重依赖于 Transformer，它在获取局部特征和分割小物体方面表现相对较差。为此，我们提出了一种简单而有效的架构，即在训练期间向 Mask2Former 引入辅助分支，以在编码器端捕获密集的局部特征。所获得的特征有助于提高学习局部信息和分割小物体的性能。由于所提出的辅助卷积层仅在训练时需要，并且可以在推理时删除，因此可以在不增加推理计算的情况下获得性能提升。实验结果表明，我们的模型在 ADE20K 数据集上可以达到最先进的性能（57.6% mIoU），在 Cityscapes 数据集上可以达到（84.8% mIoU）。

相似文献

Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation.基于辅助卷积层的 Mask Transformer 增强语义分割方法。

Sensors (Basel). 2023 Jan 4;23(2):581. doi: 10.3390/s23020581.

ETUNet:Exploring efficient transformer enhanced UNet for 3D brain tumor segmentation.ETUNet：探索高效的基于Transformer 的增强型 UNet 进行 3D 脑肿瘤分割。

Comput Biol Med. 2024 Mar;171:108005. doi: 10.1016/j.compbiomed.2024.108005. Epub 2024 Jan 23.

MMViT-Seg: A lightweight transformer and CNN fusion network for COVID-19 segmentation.MMViT-Seg：一种用于 COVID-19 分割的轻量级Transformer 和 CNN 融合网络。

Comput Methods Programs Biomed. 2023 Mar;230:107348. doi: 10.1016/j.cmpb.2023.107348. Epub 2023 Jan 12.

CoTrFuse: a novel framework by fusing CNN and transformer for medical image segmentation.CoTrFuse：一种融合 CNN 和 Transformer 的用于医学图像分割的新框架。

Phys Med Biol. 2023 Aug 22;68(17). doi: 10.1088/1361-6560/acede8.

Dual encoder network with transformer-CNN for multi-organ segmentation.基于 Transformer-CNN 的双编码器网络的多器官分割。

Med Biol Eng Comput. 2023 Mar;61(3):661-671. doi: 10.1007/s11517-022-02723-9. Epub 2022 Dec 29.

TransConver: transformer and convolution parallel network for developing automatic brain tumor segmentation in MRI images.TransConver：用于在MRI图像中开发自动脑肿瘤分割的变压器与卷积并行网络。

Quant Imaging Med Surg. 2022 Apr;12(4):2397-2415. doi: 10.21037/qims-21-919.

DECTNet: Dual Encoder Network combined convolution and Transformer architecture for medical image segmentation.DECTNet：用于医学图像分割的双编码器网络结合卷积和 Transformer 架构。

PLoS One. 2024 Apr 4;19(4):e0301019. doi: 10.1371/journal.pone.0301019. eCollection 2024.

MESTrans: Multi-scale embedding spatial transformer for medical image segmentation.MESTrans：用于医学图像分割的多尺度嵌入空间变换器

Comput Methods Programs Biomed. 2023 May;233:107493. doi: 10.1016/j.cmpb.2023.107493. Epub 2023 Mar 17.

Transformer-Based Semantic Segmentation for Extraction of Building Footprints from Very-High-Resolution Images.基于 Transformer 的语义分割在从超高分辨率图像中提取建筑物轮廓中的应用。

Sensors (Basel). 2023 May 29;23(11):5166. doi: 10.3390/s23115166.

MS-TCNet: An effective Transformer-CNN combined network using multi-scale feature learning for 3D medical image segmentation.MS-TCNet：一种基于多尺度特征学习的有效的 Transformer-CNN 组合网络，用于 3D 医学图像分割。

Comput Biol Med. 2024 Mar;170:108057. doi: 10.1016/j.compbiomed.2024.108057. Epub 2024 Jan 28.

引用本文的文献

Parotid Gland Segmentation Using Purely Transformer-Based U-Shaped Network and Multimodal MRI.基于纯Transformer 架构的 U 型网络和多模态 MRI 的腮腺分割。

Ann Biomed Eng. 2024 Aug;52(8):2101-2117. doi: 10.1007/s10439-024-03510-3. Epub 2024 May 1.

Breast cancer histopathology image-based gene expression prediction using spatial transcriptomics data and deep learning.基于空间转录组学数据和深度学习的乳腺癌组织病理学图像基因表达预测。

Sci Rep. 2023 Aug 21;13(1):13604. doi: 10.1038/s41598-023-40219-0.

本文引用的文献

P2T: Pyramid Pooling Transformer for Scene Understanding.P2T：用于场景理解的金字塔池化变换器

IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):12760-12771. doi: 10.1109/TPAMI.2022.3202765. Epub 2023 Oct 3.

Fully Convolutional Networks for Panoptic Segmentation With Point-Based Supervision.基于点监督的全景分割全卷积网络

IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):4552-4568. doi: 10.1109/TPAMI.2022.3200416. Epub 2023 Mar 7.

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.DeepLab：基于深度卷积网络、空洞卷积和全连接条件随机场的语义图像分割。

IEEE Trans Pattern Anal Mach Intell. 2018 Apr;40(4):834-848. doi: 10.1109/TPAMI.2017.2699184. Epub 2017 Apr 27.

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.SegNet：一种用于图像分割的深度卷积编解码器架构。

IEEE Trans Pattern Anal Mach Intell. 2017 Dec;39(12):2481-2495. doi: 10.1109/TPAMI.2016.2644615. Epub 2017 Jan 2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于辅助卷积层的 Mask Transformer 增强语义分割方法。

Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献