School of Artificial Intelligence, Henan University, Zhengzhou, 450046, China.
School of Artificial Intelligence, Henan University, Zhengzhou, 450046, China.
Comput Biol Med. 2023 Sep;164:107304. doi: 10.1016/j.compbiomed.2023.107304. Epub 2023 Jul 31.
Deep learning (DL) algorithms based on brain MRI images have achieved great success in the prediction of Alzheimer's disease (AD), with classification accuracy exceeding even that of the most experienced clinical experts. As a novel feature fusion method, Transformer has achieved excellent performance in many computer vision tasks, which also greatly promotes the application of Transformer in medical images. However, when Transformer is used for 3D MRI image feature fusion, existing DL models treat the input local features equally, which is inconsistent with the fact that adjacent voxels have stronger semantic connections than spatially distant voxels. In addition, due to the relatively small size of the dataset for medical images, it is difficult to capture local lesion features in limited iterative training by treating all input features equally. This paper proposes a deep learning model Conv-Swinformer that focuses on extracting and integrating local fine-grained features. Conv-Swinformer consists of a CNN module and a Transformer encoder module. The CNN module summarizes the planar features of the MRI slices, and the Transformer module establishes semantic connections in 3D space for these planar features. By introducing the shift window attention mechanism in the Transformer encoder, the attention is focused on a small spatial area of the MRI image, which effectively reduces unnecessary background semantic information and enables the model to capture local features more accurately. In addition, the layer-by-layer enlarged attention window can further integrate local fine-grained features, thus enhancing the model's attention ability. Compared with DL algorithms that indiscriminately fuse local features of MRI images, Conv-Swinformer can fine-grained extract local lesion features, thus achieving better classification results.
基于脑 MRI 图像的深度学习 (DL) 算法在阿尔茨海默病 (AD) 的预测中取得了巨大成功,其分类准确性甚至超过了最有经验的临床专家。作为一种新颖的特征融合方法,Transformer 在许多计算机视觉任务中取得了优异的性能,这也极大地推动了 Transformer 在医学图像中的应用。然而,当 Transformer 用于 3D MRI 图像特征融合时,现有的 DL 模型平等对待输入的局部特征,这与相邻体素比空间上较远的体素有更强的语义联系的事实不符。此外,由于医学图像数据集相对较小,通过平等对待所有输入特征,很难在有限的迭代训练中捕获局部病变特征。本文提出了一种专注于提取和整合局部细粒度特征的深度学习模型 Conv-Swinformer。Conv-Swinformer 由 CNN 模块和 Transformer 编码器模块组成。CNN 模块总结 MRI 切片的平面特征,Transformer 模块在 3D 空间中为这些平面特征建立语义联系。通过在 Transformer 编码器中引入移位窗口注意力机制,注意力集中在 MRI 图像的小空间区域,有效地减少了不必要的背景语义信息,使模型能够更准确地捕获局部特征。此外,逐层扩大的注意力窗口可以进一步整合局部细粒度特征,从而增强模型的注意力能力。与不加区分地融合 MRI 图像局部特征的 DL 算法相比,Conv-Swinformer 可以精细地提取局部病变特征,从而获得更好的分类结果。