Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.
Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins Medical Institutes, Baltimore, MD, USA.
Med Image Anal. 2023 Apr;85:102762. doi: 10.1016/j.media.2023.102762. Epub 2023 Jan 31.
Transformer, one of the latest technological advances of deep learning, has gained prevalence in natural language processing or computer vision. Since medical imaging bear some resemblance to computer vision, it is natural to inquire about the status quo of Transformers in medical imaging and ask the question: can the Transformer models transform medical imaging? In this paper, we attempt to make a response to the inquiry. After a brief introduction of the fundamentals of Transformers, especially in comparison with convolutional neural networks (CNNs), and highlighting key defining properties that characterize the Transformers, we offer a comprehensive review of the state-of-the-art Transformer-based approaches for medical imaging and exhibit current research progresses made in the areas of medical image segmentation, recognition, detection, registration, reconstruction, enhancement, etc. In particular, what distinguishes our review lies in its organization based on the Transformer's key defining properties, which are mostly derived from comparing the Transformer and CNN, and its type of architecture, which specifies the manner in which the Transformer and CNN are combined, all helping the readers to best understand the rationale behind the reviewed approaches. We conclude with discussions of future perspectives.
Transformer 是深度学习的最新技术进展之一,在自然语言处理或计算机视觉领域得到了广泛应用。由于医学成像与计算机视觉有些相似,因此很自然地会询问 Transformer 在医学成像中的现状,并提出一个问题:Transformer 模型能否改变医学成像?本文尝试对这一问题做出回应。在简要介绍了 Transformer 的基本原理,特别是与卷积神经网络 (CNN) 的对比,并强调了定义 Transformer 的关键特性之后,我们对基于 Transformer 的医学成像最新方法进行了全面回顾,并展示了在医学图像分割、识别、检测、配准、重建、增强等领域取得的当前研究进展。特别是,我们的综述的组织方式的独特之处在于,它基于 Transformer 的关键特性,这些特性主要是通过比较 Transformer 和 CNN 得出的,以及它的架构类型,它指定了 Transformer 和 CNN 组合的方式,所有这些都有助于读者更好地理解所综述方法的基本原理。最后我们讨论了未来的展望。