Suppr超能文献

基于辅助卷积层的 Mask Transformer 增强语义分割方法。

Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation.

机构信息

Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616, USA.

出版信息

Sensors (Basel). 2023 Jan 4;23(2):581. doi: 10.3390/s23020581.

Abstract

Transformer-based semantic segmentation methods have achieved excellent performance in recent years. Mask2Former is one of the well-known transformer-based methods which unifies common image segmentation into a universal model. However, it performs relatively poorly in obtaining local features and segmenting small objects due to relying heavily on transformers. To this end, we propose a simple yet effective architecture that introduces auxiliary branches to Mask2Former during training to capture dense local features on the encoder side. The obtained features help improve the performance of learning local information and segmenting small objects. Since the proposed auxiliary convolution layers are required only for training and can be removed during inference, the performance gain can be obtained without additional computation at inference. Experimental results show that our model can achieve state-of-the-art performance (57.6% mIoU) on the ADE20K and (84.8% mIoU) on the Cityscapes datasets.

摘要

基于 Transformer 的语义分割方法近年来取得了优异的性能。Mask2Former 是基于 Transformer 的知名方法之一,它将常见的图像分割统一为一个通用模型。然而,由于严重依赖于 Transformer,它在获取局部特征和分割小物体方面表现相对较差。为此,我们提出了一种简单而有效的架构,即在训练期间向 Mask2Former 引入辅助分支,以在编码器端捕获密集的局部特征。所获得的特征有助于提高学习局部信息和分割小物体的性能。由于所提出的辅助卷积层仅在训练时需要,并且可以在推理时删除,因此可以在不增加推理计算的情况下获得性能提升。实验结果表明,我们的模型在 ADE20K 数据集上可以达到最先进的性能(57.6% mIoU),在 Cityscapes 数据集上可以达到(84.8% mIoU)。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验