MResTNet：一种带有卷积神经网络扩展的多分辨率Transformer框架用于语义分割

MResTNet: A Multi-Resolution Transformer Framework with CNN Extensions for Semantic Segmentation.

作者信息

Detsikas Nikolaos, Mitianoudis Nikolaos, Pratikakis Ioannis

机构信息

Electrical and Computer Engineering Department, Democritus University of Thrace, University Campus Xanthi-Kimmeria, 67100 Xanthi, Greece.

出版信息

J Imaging. 2024 May 21;10(6):125. doi: 10.3390/jimaging10060125.

DOI:10.3390/jimaging10060125

PMID:38921602

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11204546/

Abstract

A fundamental task in computer vision is the process of differentiation and identification of different objects or entities in a visual scene using semantic segmentation methods. The advancement of transformer networks has surpassed traditional convolutional neural network (CNN) architectures in terms of segmentation performance. The continuous pursuit of optimal performance, with respect to the popular evaluation metric results, has led to very large architectures that require a significant amount of computational power to operate, making them prohibitive for real-time applications, including autonomous driving. In this paper, we propose a model that leverages a visual transformer encoder with a parallel twin decoder, consisting of a visual transformer decoder and a CNN decoder with multi-resolution connections working in parallel. The two decoders are merged with the aid of two trainable CNN blocks, the fuser that combined the information from the two decoders and the scaler that scales the contribution of each decoder. The proposed model achieves state-of-the-art performance on the Cityscapes and ADE20K datasets, maintaining a low-complexity network that can be used in real-time applications.

摘要

计算机视觉中的一项基本任务是使用语义分割方法对视觉场景中的不同物体或实体进行区分和识别。在分割性能方面，Transformer网络的发展已经超越了传统的卷积神经网络（CNN）架构。对于流行的评估指标结果，对最优性能的持续追求导致了非常大的架构，这些架构需要大量的计算能力来运行，这使得它们对于包括自动驾驶在内的实时应用来说是难以承受的。在本文中，我们提出了一种模型，该模型利用带有并行双解码器的视觉Transformer编码器，并行双解码器由一个视觉Transformer解码器和一个具有多分辨率连接的CNN解码器组成。借助两个可训练的CNN模块，将两个解码器合并，即融合器（它组合来自两个解码器的信息）和缩放器（它缩放每个解码器的贡献）。所提出的模型在Cityscapes和ADE20K数据集上实现了当前最优的性能，同时保持了可用于实时应用的低复杂度网络。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dbd4/11204546/e02026ef6922/jimaging-10-00125-g001.jpg

相似文献

MResTNet: A Multi-Resolution Transformer Framework with CNN Extensions for Semantic Segmentation.MResTNet：一种带有卷积神经网络扩展的多分辨率Transformer框架用于语义分割

J Imaging. 2024 May 21;10(6):125. doi: 10.3390/jimaging10060125.

MS-TCNet: An effective Transformer-CNN combined network using multi-scale feature learning for 3D medical image segmentation.MS-TCNet：一种基于多尺度特征学习的有效的 Transformer-CNN 组合网络，用于 3D 医学图像分割。

Comput Biol Med. 2024 Mar;170:108057. doi: 10.1016/j.compbiomed.2024.108057. Epub 2024 Jan 28.

O-Net: A Novel Framework With Deep Fusion of CNN and Transformer for Simultaneous Segmentation and Classification.O-Net：一种将卷积神经网络（CNN）与Transformer深度融合以实现同步分割和分类的新型框架。

Front Neurosci. 2022 Jun 2;16:876065. doi: 10.3389/fnins.2022.876065. eCollection 2022.

MCV-UNet: a modified convolution & transformer hybrid encoder-decoder network with multi-scale information fusion for ultrasound image semantic segmentation.MCV-UNet：一种用于超声图像语义分割的改进型卷积与Transformer混合编码器-解码器网络，具有多尺度信息融合功能。

PeerJ Comput Sci. 2024 Jun 24;10:e2146. doi: 10.7717/peerj-cs.2146. eCollection 2024.

Dual encoder network with transformer-CNN for multi-organ segmentation.基于 Transformer-CNN 的双编码器网络的多器官分割。

Med Biol Eng Comput. 2023 Mar;61(3):661-671. doi: 10.1007/s11517-022-02723-9. Epub 2022 Dec 29.

ETUNet:Exploring efficient transformer enhanced UNet for 3D brain tumor segmentation.ETUNet：探索高效的基于Transformer 的增强型 UNet 进行 3D 脑肿瘤分割。

Comput Biol Med. 2024 Mar;171:108005. doi: 10.1016/j.compbiomed.2024.108005. Epub 2024 Jan 23.

Hybrid CNN-Transformer Network With Circular Feature Interaction for Acute Ischemic Stroke Lesion Segmentation on Non-Contrast CT Scans.基于循环特征交互的混合 CNN-Transformer 网络的非对比 CT 扫描急性缺血性脑卒中病灶分割。

IEEE Trans Med Imaging. 2024 Jun;43(6):2303-2316. doi: 10.1109/TMI.2024.3362879. Epub 2024 Jun 3.

TSCA-Net: Transformer based spatial-channel attention segmentation network for medical images.TSCA-Net：基于Transformer 的空间-通道注意力分割网络用于医学图像。

Comput Biol Med. 2024 Mar;170:107938. doi: 10.1016/j.compbiomed.2024.107938. Epub 2024 Jan 3.

MixSeg: a lightweight and accurate mix structure network for semantic segmentation of apple leaf disease in complex environments.MixSeg：一种用于复杂环境下苹果叶部病害语义分割的轻量级且精确的混合结构网络。

Front Plant Sci. 2023 Sep 13;14:1233241. doi: 10.3389/fpls.2023.1233241. eCollection 2023.

Robust Automated Tumour Segmentation Network Using 3D Direction-Wise Convolution and Transformer.基于 3D 方向卷积和 Transformer 的稳健自动肿瘤分割网络

J Imaging Inform Med. 2024 Oct;37(5):2444-2453. doi: 10.1007/s10278-024-01131-9. Epub 2024 May 9.

本文引用的文献

Prototype-Based Semantic Segmentation.

IEEE Trans Pattern Anal Mach Intell. 2024 Oct;46(10):6858-6872. doi: 10.1109/TPAMI.2024.3387116. Epub 2024 Sep 5.

MultiResUNet : Rethinking the U-Net architecture for multimodal biomedical image segmentation.多模态生物医学图像分割的 U-Net 架构再思考：MultiResUNet

Neural Netw. 2020 Jan;121:74-87. doi: 10.1016/j.neunet.2019.08.025. Epub 2019 Sep 4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

MResTNet：一种带有卷积神经网络扩展的多分辨率Transformer框架用于语义分割

MResTNet: A Multi-Resolution Transformer Framework with CNN Extensions for Semantic Segmentation.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献