• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于放射图像分析的混合视觉Transformer架构的系统综述

Systematic Review of Hybrid Vision Transformer Architectures for Radiological Image Analysis.

作者信息

Kim Ji Woong, Khan Aisha Urooj, Banerjee Imon

机构信息

School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ, USA.

Department of Radiology, Mayo Clinic, Phoenix, AZ, USA.

出版信息

J Imaging Inform Med. 2025 Jan 27. doi: 10.1007/s10278-024-01322-4.

DOI:10.1007/s10278-024-01322-4
PMID:39871042
Abstract

Vision transformer (ViT)and convolutional neural networks (CNNs) each possess distinct strengths in medical imaging: ViT excels in capturing long-range dependencies through self-attention, while CNNs are adept at extracting local features via spatial convolution filters. While ViT may struggle with capturing detailed local spatial information, critical for tasks like anomaly detection in medical imaging, shallow CNNs often fail to effectively abstract global context. This study aims to explore and evaluate hybrid architectures that integrate ViT and CNN to leverage their complementary strengths for enhanced performance in medical vision tasks, such as segmentation, classification, reconstruction, and prediction. Following PRISMA guideline, a systematic review was conducted on 34 articles published between 2020 and Sept. 2024. These articles proposed novel hybrid ViT-CNN architectures specifically for medical imaging tasks in radiology. The review focused on analyzing architectural variations, merging strategies between ViT and CNN, innovative applications of ViT, and efficiency metrics including parameters, inference time (GFlops), and performance benchmarks. The review identified that integrating ViT and CNN can mitigate the limitations of each architecture offering comprehensive solutions that combine global context understanding with precise local feature extraction. We benchmarked the articles based on architectural variations, merging strategies, innovative uses of ViT, and efficiency metrics (number of parameters, inference time (GFlops), and performance), and derived a ranked list. By synthesizing current literature, this review defines fundamental concepts of hybrid vision transformers and highlights emerging trends in the field. It provides a clear direction for future research aimed at optimizing the integration of ViT and CNN for effective utilization in medical imaging, contributing to advancements in diagnostic accuracy and image analysis. We performed systematic review of hybrid vision transformer architecture using PRISMA guideline and performed thorough comparative analysis to benchmark the architectures.

摘要

视觉Transformer(ViT)和卷积神经网络(CNN)在医学成像领域各有独特优势:ViT擅长通过自注意力捕捉长程依赖关系,而CNN则善于通过空间卷积滤波器提取局部特征。虽然ViT在捕捉详细的局部空间信息方面可能存在困难,而这对医学成像中的异常检测等任务至关重要,但浅层CNN往往无法有效提取全局上下文信息。本研究旨在探索和评估整合ViT和CNN的混合架构,以利用它们的互补优势,在医学视觉任务(如分割、分类、重建和预测)中提升性能。按照PRISMA指南,对2020年至2024年9月期间发表的34篇文章进行了系统综述。这些文章提出了专门用于放射学医学成像任务的新型ViT-CNN混合架构。该综述重点分析了架构变化、ViT和CNN之间的融合策略、ViT的创新应用以及效率指标,包括参数、推理时间(每秒千兆浮点运算次数)和性能基准。该综述发现,整合ViT和CNN可以减轻每种架构的局限性,提供结合全局上下文理解和精确局部特征提取的全面解决方案。我们根据架构变化、融合策略、ViT的创新用途和效率指标(参数数量、推理时间(每秒千兆浮点运算次数)和性能)对这些文章进行了基准测试,并得出了一个排名列表。通过综合当前文献,本综述定义了混合视觉Transformer的基本概念,并突出了该领域的新兴趋势。它为未来旨在优化ViT和CNN整合以在医学成像中有效应用的研究提供了明确方向,有助于提高诊断准确性和图像分析的进展。我们使用PRISMA指南对混合视觉Transformer架构进行了系统综述,并进行了全面的比较分析以对这些架构进行基准测试。

相似文献

1
Systematic Review of Hybrid Vision Transformer Architectures for Radiological Image Analysis.用于放射图像分析的混合视觉Transformer架构的系统综述
J Imaging Inform Med. 2025 Jan 27. doi: 10.1007/s10278-024-01322-4.
2
Enhanced Maize Leaf Disease Detection and Classification Using an Integrated CNN-ViT Model.使用集成的卷积神经网络-视觉Transformer模型增强玉米叶部病害检测与分类
Food Sci Nutr. 2025 Jun 30;13(7):e70513. doi: 10.1002/fsn3.70513. eCollection 2025 Jul.
3
A novel UNet-SegNet and vision transformer architectures for efficient segmentation and classification in medical imaging.一种用于医学成像中高效分割和分类的新型UNet-SegNet和视觉Transformer架构。
Phys Eng Sci Med. 2025 Jul 8. doi: 10.1007/s13246-025-01564-8.
4
Recognizing American Sign Language gestures efficiently and accurately using a hybrid transformer model.使用混合变压器模型高效准确地识别美国手语手势。
Sci Rep. 2025 Jun 23;15(1):20253. doi: 10.1038/s41598-025-06344-8.
5
Transformers for Neuroimage Segmentation: Scoping Review.用于神经图像分割的变压器:范围综述。
J Med Internet Res. 2025 Jan 29;27:e57723. doi: 10.2196/57723.
6
Advancing respiratory disease diagnosis: A deep learning and vision transformer-based approach with a novel X-ray dataset.推进呼吸系统疾病诊断:一种基于深度学习和视觉Transformer的方法及新型X射线数据集
Comput Biol Med. 2025 Aug;194:110501. doi: 10.1016/j.compbiomed.2025.110501. Epub 2025 Jun 9.
7
Comparative analysis of convolutional neural networks and transformer architectures for breast cancer histopathological image classification.用于乳腺癌组织病理学图像分类的卷积神经网络与Transformer架构的比较分析
Front Med (Lausanne). 2025 Jun 17;12:1606336. doi: 10.3389/fmed.2025.1606336. eCollection 2025.
8
Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer.利用晚期癌症患者腹部和骨盆 CT 图像建立卷积神经网络模型预测股骨近端病理性骨折的研究
Clin Orthop Relat Res. 2023 Nov 1;481(11):2247-2256. doi: 10.1097/CORR.0000000000002771. Epub 2023 Aug 23.
9
HVUNet: A hybrid vision transformer-based UNet for accurate detection and localization in histopathology images.HVUNet:一种基于混合视觉变换器的UNet,用于在组织病理学图像中进行精确检测和定位。
Comput Biol Med. 2025 Jul 15;196(Pt B):110680. doi: 10.1016/j.compbiomed.2025.110680.
10
Short-Term Memory Impairment短期记忆障碍

引用本文的文献

1
Research on a noise-suppression super-resolution enhancement module for positron flow field images based on convolution and SwinTransformer structures.基于卷积和SwinTransformer结构的正电子流场图像噪声抑制超分辨率增强模块研究
Sci Rep. 2025 Jul 1;15(1):21443. doi: 10.1038/s41598-025-07107-1.
2
Recognizing American Sign Language gestures efficiently and accurately using a hybrid transformer model.使用混合变压器模型高效准确地识别美国手语手势。
Sci Rep. 2025 Jun 23;15(1):20253. doi: 10.1038/s41598-025-06344-8.
3
From Image to Sequence: Exploring Vision Transformers for Optical Coherence Tomography Classification.

本文引用的文献

1
D-TrAttUnet: Toward hybrid CNN-transformer architecture for generic and subtle segmentation in medical images.D-TrAttUnet:一种用于医学图像通用和细微分割的混合 CNN-Transformer 架构。
Comput Biol Med. 2024 Jun;176:108590. doi: 10.1016/j.compbiomed.2024.108590. Epub 2024 May 11.
2
Hybrid CNN-Transformer Network With Circular Feature Interaction for Acute Ischemic Stroke Lesion Segmentation on Non-Contrast CT Scans.基于循环特征交互的混合 CNN-Transformer 网络的非对比 CT 扫描急性缺血性脑卒中病灶分割。
IEEE Trans Med Imaging. 2024 Jun;43(6):2303-2316. doi: 10.1109/TMI.2024.3362879. Epub 2024 Jun 3.
3
从图像到序列:探索用于光学相干断层扫描分类的视觉Transformer
J Med Signals Sens. 2025 Jun 9;15:18. doi: 10.4103/jmss.jmss_58_24. eCollection 2025.
Hybrid CNN-transformer network for interactive learning of challenging musculoskeletal images.
用于交互式学习挑战性肌肉骨骼图像的混合 CNN-Transformer 网络。
Comput Methods Programs Biomed. 2024 Jan;243:107875. doi: 10.1016/j.cmpb.2023.107875. Epub 2023 Oct 19.
4
Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives.基于 Transformer 的医学影像变革?关键特性、当前进展和未来展望的对比综述。
Med Image Anal. 2023 Apr;85:102762. doi: 10.1016/j.media.2023.102762. Epub 2023 Jan 31.
5
Visual Transformers and Convolutional Neural Networks for Disease Classification on Radiographs: A Comparison of Performance, Sample Efficiency, and Hidden Stratification.用于X光片疾病分类的视觉Transformer和卷积神经网络:性能、样本效率及隐藏分层的比较
Radiol Artif Intell. 2022 Sep 21;4(6):e220012. doi: 10.1148/ryai.220012. eCollection 2022 Nov.
6
TransMorph: Transformer for unsupervised medical image registration.TransMorph:用于无监督医学图像配准的转换器。
Med Image Anal. 2022 Nov;82:102615. doi: 10.1016/j.media.2022.102615. Epub 2022 Sep 14.
7
ResViT: Residual Vision Transformers for Multimodal Medical Image Synthesis.ResViT:用于多模态医学图像合成的残差视觉转换器。
IEEE Trans Med Imaging. 2022 Oct;41(10):2598-2614. doi: 10.1109/TMI.2022.3167808. Epub 2022 Sep 30.
8
Unsupervised MRI Reconstruction via Zero-Shot Learned Adversarial Transformers.基于零样本学习对抗 Transformer 的无监督 MRI 重建。
IEEE Trans Med Imaging. 2022 Jul;41(7):1747-1763. doi: 10.1109/TMI.2022.3147426. Epub 2022 Jun 30.
9
Multi-Centre, Multi-Vendor and Multi-Disease Cardiac Segmentation: The M&Ms Challenge.多中心、多供应商和多病种心脏分割:M&Ms 挑战赛。
IEEE Trans Med Imaging. 2021 Dec;40(12):3543-3554. doi: 10.1109/TMI.2021.3090082. Epub 2021 Nov 30.
10
Automated Brain Tumor Segmentation Using Multimodal Brain Scans: A Survey Based on Models Submitted to the BraTS 2012-2018 Challenges.基于 BraTS 2012-2018 挑战赛提交模型的多模态脑扫描的自动脑肿瘤分割:调查
IEEE Rev Biomed Eng. 2020;13:156-168. doi: 10.1109/RBME.2019.2946868. Epub 2019 Oct 11.