Division of Medical AI Research and Development, National Cancer Center Research Institute, 5-1-1 Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan.
Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan.
J Med Syst. 2024 Sep 12;48(1):84. doi: 10.1007/s10916-024-02105-8.
In the rapidly evolving field of medical image analysis utilizing artificial intelligence (AI), the selection of appropriate computational models is critical for accurate diagnosis and patient care. This literature review provides a comprehensive comparison of vision transformers (ViTs) and convolutional neural networks (CNNs), the two leading techniques in the field of deep learning in medical imaging. We conducted a survey systematically. Particular attention was given to the robustness, computational efficiency, scalability, and accuracy of these models in handling complex medical datasets. The review incorporates findings from 36 studies and indicates a collective trend that transformer-based models, particularly ViTs, exhibit significant potential in diverse medical imaging tasks, showcasing superior performance when contrasted with conventional CNN models. Additionally, it is evident that pre-training is important for transformer applications. We expect this work to help researchers and practitioners select the most appropriate model for specific medical image analysis tasks, accounting for the current state of the art and future trends in the field.
在利用人工智能(AI)进行医学影像分析的快速发展领域中,选择合适的计算模型对于准确的诊断和患者护理至关重要。本文献综述全面比较了视觉转换器(ViTs)和卷积神经网络(CNNs),这两种技术是医学影像深度学习领域的领先技术。我们进行了系统的调查。特别关注这些模型在处理复杂医学数据集时的鲁棒性、计算效率、可扩展性和准确性。该综述结合了 36 项研究的结果,表明基于转换器的模型,特别是 ViTs,在各种医学成像任务中表现出显著的潜力,与传统的 CNN 模型相比表现出优异的性能。此外,预训练对于转换器的应用很重要。我们期望这项工作能够帮助研究人员和从业者根据该领域的当前技术水平和未来趋势,为特定的医学图像分析任务选择最合适的模型。