Department of Artificial Intelligence, Ajou University, Suwon, South Korea.
Department of Convergence Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea.
Sci Rep. 2024 Apr 16;14(1):8755. doi: 10.1038/s41598-024-58382-3.
In this paper, we introduce in-depth the analysis of CNNs and ViT architectures in medical images, with the goal of providing insights into subsequent research direction. In particular, the origins of deep neural networks should be explainable for medical images, but there has been a paucity of studies on such explainability in the aspect of deep neural network architectures. Therefore, we investigate the origin of model performance, which is the clue to explaining deep neural networks, focusing on the two most relevant architectures, such as CNNs and ViT. We give four analyses, including (1) robustness in a noisy environment, (2) consistency in translation invariance property, (3) visual recognition with obstructed images, and (4) acquired features from shape or texture so that we compare origins of CNNs and ViT that cause the differences of visual recognition performance. Furthermore, the discrepancies between medical and generic images are explored regarding such analyses. We discover that medical images, unlike generic ones, exhibit class-sensitive. Finally, we propose a straightforward ensemble method based on our analyses, demonstrating that our findings can help build follow-up studies. Our analysis code will be publicly available.
本文深入分析了卷积神经网络(CNN)和视觉Transformer(ViT)在医学图像中的应用,旨在为后续研究方向提供思路。特别是,对于医学图像来说,深度神经网络的起源应该是可解释的,但在深度神经网络架构方面,关于这种可解释性的研究还很少。因此,我们研究了模型性能的起源,这是解释深度神经网络的线索,重点关注两个最相关的架构,如 CNN 和 ViT。我们进行了四项分析,包括(1)在嘈杂环境中的鲁棒性,(2)平移不变性的一致性,(3)遮挡图像的视觉识别,以及(4)从形状或纹理中获取特征,以便比较导致视觉识别性能差异的 CNN 和 ViT 的起源。此外,还针对这些分析探讨了医学图像和通用图像之间的差异。我们发现,与通用图像不同,医学图像表现出类敏感。最后,我们提出了一种基于我们的分析的简单集成方法,证明了我们的发现可以帮助构建后续研究。我们的分析代码将公开提供。