分析在医学图像中发现 CNN 和 ViT 架构的起源。

Analyzing to discover origins of CNNs and ViT architectures in medical images.

机构信息

Department of Artificial Intelligence, Ajou University, Suwon, South Korea.

Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea.

出版信息

Sci Rep. 2024 Apr 16;14(1):8755. doi: 10.1038/s41598-024-58382-3.

DOI:10.1038/s41598-024-58382-3

PMID:38627477

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11021435/

Abstract

In this paper, we introduce in-depth the analysis of CNNs and ViT architectures in medical images, with the goal of providing insights into subsequent research direction. In particular, the origins of deep neural networks should be explainable for medical images, but there has been a paucity of studies on such explainability in the aspect of deep neural network architectures. Therefore, we investigate the origin of model performance, which is the clue to explaining deep neural networks, focusing on the two most relevant architectures, such as CNNs and ViT. We give four analyses, including (1) robustness in a noisy environment, (2) consistency in translation invariance property, (3) visual recognition with obstructed images, and (4) acquired features from shape or texture so that we compare origins of CNNs and ViT that cause the differences of visual recognition performance. Furthermore, the discrepancies between medical and generic images are explored regarding such analyses. We discover that medical images, unlike generic ones, exhibit class-sensitive. Finally, we propose a straightforward ensemble method based on our analyses, demonstrating that our findings can help build follow-up studies. Our analysis code will be publicly available.

摘要

本文深入分析了卷积神经网络（CNN）和视觉Transformer（ViT）在医学图像中的应用，旨在为后续研究方向提供思路。特别是，对于医学图像来说，深度神经网络的起源应该是可解释的，但在深度神经网络架构方面，关于这种可解释性的研究还很少。因此，我们研究了模型性能的起源，这是解释深度神经网络的线索，重点关注两个最相关的架构，如 CNN 和 ViT。我们进行了四项分析，包括（1）在嘈杂环境中的鲁棒性，（2）平移不变性的一致性，（3）遮挡图像的视觉识别，以及（4）从形状或纹理中获取特征，以便比较导致视觉识别性能差异的 CNN 和 ViT 的起源。此外，还针对这些分析探讨了医学图像和通用图像之间的差异。我们发现，与通用图像不同，医学图像表现出类敏感。最后，我们提出了一种基于我们的分析的简单集成方法，证明了我们的发现可以帮助构建后续研究。我们的分析代码将公开提供。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df91/11021435/b79c108e592f/41598_2024_58382_Fig1_HTML.jpg

相似文献

Analyzing to discover origins of CNNs and ViT architectures in medical images.分析在医学图像中发现 CNN 和 ViT 架构的起源。

Sci Rep. 2024 Apr 16;14(1):8755. doi: 10.1038/s41598-024-58382-3.

An Ensemble of Fine-Tuned Convolutional Neural Networks for Medical Image Classification.用于医学图像分类的微调卷积神经网络集成

IEEE J Biomed Health Inform. 2017 Jan;21(1):31-40. doi: 10.1109/JBHI.2016.2635663. Epub 2016 Dec 5.

COVID-19 Recognition Using Ensemble-CNNs in Two New Chest X-ray Databases.使用两种新的胸部 X 射线数据库中的 Ensemble-CNNs 进行 COVID-19 识别。

Sensors (Basel). 2021 Mar 3;21(5):1742. doi: 10.3390/s21051742.

Deep local-to-global feature learning for medical image super-resolution.用于医学图像超分辨率的深度局部到全局特征学习。

Comput Med Imaging Graph. 2024 Jul;115:102374. doi: 10.1016/j.compmedimag.2024.102374. Epub 2024 Mar 26.

A Model Visualization-based Approach for Insight into Waveforms and Spectra Learned by CNNs.基于模型可视化的方法洞察 CNN 学习到的波形和频谱。

Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:1643-1646. doi: 10.1109/EMBC48229.2022.9871414.

Ensembles of Deep Learning Models and Transfer Learning for Ear Recognition.深度学习模型集成与迁移学习在耳识别中的应用。

Sensors (Basel). 2019 Sep 24;19(19):4139. doi: 10.3390/s19194139.

From photos to sketches - how humans and deep neural networks process objects across different levels of visual abstraction.从照片到素描——人类和深度神经网络如何在不同层次的视觉抽象中处理对象。

J Vis. 2022 Feb 1;22(2):4. doi: 10.1167/jov.22.2.4.

MABAL: a Novel Deep-Learning Architecture for Machine-Assisted Bone Age Labeling.MABAL：一种用于机器辅助骨龄标注的新型深度学习架构。

J Digit Imaging. 2018 Aug;31(4):513-519. doi: 10.1007/s10278-018-0053-3.

End-to-end multimodal clinical depression recognition using deep neural networks: A comparative analysis.端到端使用深度神经网络进行多模态临床抑郁症识别：比较分析。

Comput Methods Programs Biomed. 2021 Nov;211:106433. doi: 10.1016/j.cmpb.2021.106433. Epub 2021 Sep 28.

Deep Learning: An Update for Radiologists.深度学习：放射科医生的更新。

Radiographics. 2021 Sep-Oct;41(5):1427-1445. doi: 10.1148/rg.2021200210.

引用本文的文献

BuoyancyNet: a deep learning approach for assessing float buoyancy in mussel aquaculture.

J R Soc N Z. 2025 Apr 23;55(6):2013-2041. doi: 10.1080/03036758.2025.2488415. eCollection 2025.

Exploring feature sparsity for out-of-distribution detection.探索用于分布外检测的特征稀疏性。

Sci Rep. 2024 Nov 18;14(1):28444. doi: 10.1038/s41598-024-79934-7.

本文引用的文献

Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives.基于 Transformer 的医学影像变革？关键特性、当前进展和未来展望的对比综述。

Med Image Anal. 2023 Apr;85:102762. doi: 10.1016/j.media.2023.102762. Epub 2023 Jan 31.

A Systematic Benchmarking Analysis of Transfer Learning for Medical Image Analysis.医学图像分析中迁移学习的系统基准分析

Domain Adapt Represent Transf Afford Healthc AI Resour Divers Glob Health (2021). 2021 Sep-Oct;12968:3-13. doi: 10.1007/978-3-030-87722-4_1. Epub 2021 Sep 21.

The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions.HAM10000 数据集，一个大型的常见色素性皮肤病变多源皮肤镜图像集合。

Sci Data. 2018 Aug 14;5:180161. doi: 10.1038/sdata.2018.161.