Department of Mechanical Engineering, K. N. Toosi University of Technology, Tehran, Iran.
Physiology Research Center, Iran University Medical Sciences, Tehran, Iran.
Sci Rep. 2024 Aug 3;14(1):18007. doi: 10.1038/s41598-024-69119-7.
Within the scope of this investigation, we carried out experiments to investigate the potential of the Vision Transformer (ViT) in the field of medical image analysis. The diagnosis of osteoporosis through inspection of X-ray radio-images is a substantial classification problem that we were able to address with the assistance of Vision Transformer models. In order to provide a basis for comparison, we conducted a parallel analysis in which we sought to solve the same problem by employing traditional convolutional neural networks (CNNs), which are well-known and commonly used techniques for the solution of image categorization issues. The findings of our research led us to conclude that ViT is capable of achieving superior outcomes compared to CNN. Furthermore, provided that methods have access to a sufficient quantity of training data, the probability increases that both methods arrive at more appropriate solutions to critical issues.
在本次调查的范围内,我们开展了实验,旨在探究 Vision Transformer(ViT)在医学图像分析领域的潜力。通过 X 射线放射影像检查来诊断骨质疏松症是一个重大的分类问题,我们借助 Vision Transformer 模型成功地解决了这一问题。为了提供比较的依据,我们进行了平行分析,尝试使用传统的卷积神经网络(CNN)来解决同样的问题,CNN 是解决图像分类问题的常用技术,具有广泛的应用。我们的研究结果表明,ViT 能够取得比 CNN 更优的结果。此外,如果方法能够获得足够数量的训练数据,那么这两种方法都更有可能找到更合适的解决方案来处理关键问题。