IEEE J Biomed Health Inform. 2024 Mar;28(3):1398-1411. doi: 10.1109/JBHI.2023.3348436. Epub 2024 Mar 6.
Medical imaging is a key component in clinical diagnosis, treatment planning and clinical trial design, accounting for almost 90% of all healthcare data. CNNs achieved performance gains in medical image analysis (MIA) over the last years. CNNs can efficiently model local pixel interactions and be trained on small-scale MI data. Despite their important advances, typical CNN have relatively limited capabilities in modelling "global" pixel interactions, which restricts their generalisation ability to understand out-of-distribution data with different "global" information. The recent progress of Artificial Intelligence gave rise to Transformers, which can learn global relationships from data. However, full Transformer models need to be trained on large-scale data and involve tremendous computational complexity. Attention and Transformer compartments ("Transf/Attention") which can well maintain properties for modelling global relationships, have been proposed as lighter alternatives of full Transformers. Recently, there is an increasing trend to co-pollinate complementary local-global properties from CNN and Transf/Attention architectures, which led to a new era of hybrid models. The past years have witnessed substantial growth in hybrid CNN-Transf/Attention models across diverse MIA problems. In this systematic review, we survey existing hybrid CNN-Transf/Attention models, review and unravel key architectural designs, analyse breakthroughs, and evaluate current and future opportunities as well as challenges. We also introduced an analysis framework on generalisation opportunities of scientific and clinical impact, based on which new data-driven domain generalisation and adaptation methods can be stimulated.
医学影像学是临床诊断、治疗计划和临床试验设计的关键组成部分,几乎占所有医疗保健数据的 90%。近年来,CNN 在医学图像分析(MIA)中取得了性能上的提高。CNN 可以有效地对局部像素交互进行建模,并可以从小规模的 MI 数据中进行训练。尽管它们取得了重要的进展,但典型的 CNN 在建模“全局”像素交互方面的能力相对有限,这限制了它们理解具有不同“全局”信息的分布外数据的泛化能力。人工智能的最新进展催生了 Transformer,它可以从数据中学习全局关系。然而,全 Transformer 模型需要在大规模数据上进行训练,并且涉及巨大的计算复杂性。注意力和 Transformer 模块(“Transf/Attention”)可以很好地保持建模全局关系的属性,被提出作为全 Transformer 的更轻量级替代方案。最近,从 CNN 和 Transf/Attention 架构中共同利用互补的局部-全局特性的趋势不断增加,这导致了混合模型的新时代。在过去的几年中,跨多种 MIA 问题的混合 CNN-Transf/Attention 模型取得了实质性的增长。在本系统综述中,我们调查了现有的混合 CNN-Transf/Attention 模型,回顾和揭示了关键的架构设计,分析了突破,并评估了当前和未来的机会以及挑战。我们还介绍了一个基于该分析框架的关于科学和临床影响的泛化机会的分析框架,基于该框架可以激发新的数据驱动的领域泛化和适应方法。