Suppr超能文献

MResTNet:一种带有卷积神经网络扩展的多分辨率Transformer框架用于语义分割

MResTNet: A Multi-Resolution Transformer Framework with CNN Extensions for Semantic Segmentation.

作者信息

Detsikas Nikolaos, Mitianoudis Nikolaos, Pratikakis Ioannis

机构信息

Electrical and Computer Engineering Department, Democritus University of Thrace, University Campus Xanthi-Kimmeria, 67100 Xanthi, Greece.

出版信息

J Imaging. 2024 May 21;10(6):125. doi: 10.3390/jimaging10060125.

Abstract

A fundamental task in computer vision is the process of differentiation and identification of different objects or entities in a visual scene using semantic segmentation methods. The advancement of transformer networks has surpassed traditional convolutional neural network (CNN) architectures in terms of segmentation performance. The continuous pursuit of optimal performance, with respect to the popular evaluation metric results, has led to very large architectures that require a significant amount of computational power to operate, making them prohibitive for real-time applications, including autonomous driving. In this paper, we propose a model that leverages a visual transformer encoder with a parallel twin decoder, consisting of a visual transformer decoder and a CNN decoder with multi-resolution connections working in parallel. The two decoders are merged with the aid of two trainable CNN blocks, the fuser that combined the information from the two decoders and the scaler that scales the contribution of each decoder. The proposed model achieves state-of-the-art performance on the Cityscapes and ADE20K datasets, maintaining a low-complexity network that can be used in real-time applications.

摘要

计算机视觉中的一项基本任务是使用语义分割方法对视觉场景中的不同物体或实体进行区分和识别。在分割性能方面,Transformer网络的发展已经超越了传统的卷积神经网络(CNN)架构。对于流行的评估指标结果,对最优性能的持续追求导致了非常大的架构,这些架构需要大量的计算能力来运行,这使得它们对于包括自动驾驶在内的实时应用来说是难以承受的。在本文中,我们提出了一种模型,该模型利用带有并行双解码器的视觉Transformer编码器,并行双解码器由一个视觉Transformer解码器和一个具有多分辨率连接的CNN解码器组成。借助两个可训练的CNN模块,将两个解码器合并,即融合器(它组合来自两个解码器的信息)和缩放器(它缩放每个解码器的贡献)。所提出的模型在Cityscapes和ADE20K数据集上实现了当前最优的性能,同时保持了可用于实时应用的低复杂度网络。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dbd4/11204546/e02026ef6922/jimaging-10-00125-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验