Wang Shuai, Liu Lei, Wang Jun, Peng Xinyue, Liu Baosen
School of Computer Science and Technology, Huaibei Normal University, Huaibei, China.
Huaibei Key Laboratory of Digital Multimedia Intelligent Information Processing, Huaibei, China.
PeerJ Comput Sci. 2024 Dec 3;10:e2563. doi: 10.7717/peerj-cs.2563. eCollection 2024.
Transformer-based technology has attracted widespread attention in medical image segmentation. Due to the diversity of organs, effective modelling of multi-scale information and establishing long-range dependencies between pixels are crucial for successful medical image segmentation. However, most studies rely on a fixed single-scale window for modeling, which ignores the potential impact of window size on performance. This limitation can hinder window-based models' ability to fully explore multi-scale and long-range relationships within medical images. To address this issue, we propose a multi-scale reconfiguration self-attention (MSR-SA) module that accurately models multi-scale information and long-range dependencies in medical images. The MSR-SA module first divides the attention heads into multiple groups, each assigned an ascending dilation rate. These groups are then uniformly split into several non-overlapping local windows. Using dilated sampling, we gather the same number of keys to obtain both long-range and multi-scale information. Finally, dynamic information fusion is achieved by integrating features from the sampling points at corresponding positions across different windows. Based on the MSR-SA module, we propose a multi-scale reconfiguration U-Net (MSR-UNet) framework for medical image segmentation. Experiments on the Synapse and automated cardiac diagnosis challenge (ACDC) datasets show that MSR-UNet can achieve satisfactory segmentation results. The code is available at https://github.com/davidsmithwj/MSR-UNet (DOI: 10.5281/zenodo.13969855).
基于Transformer的技术在医学图像分割中引起了广泛关注。由于器官的多样性,有效建模多尺度信息并在像素之间建立长距离依赖关系对于成功进行医学图像分割至关重要。然而,大多数研究依赖于固定的单尺度窗口进行建模,这忽略了窗口大小对性能的潜在影响。这种局限性可能会阻碍基于窗口的模型充分探索医学图像中多尺度和长距离关系的能力。为了解决这个问题,我们提出了一种多尺度重构自注意力(MSR-SA)模块,该模块可以准确地对医学图像中的多尺度信息和长距离依赖关系进行建模。MSR-SA模块首先将注意力头划分为多个组,每个组分配一个递增的扩张率。然后将这些组均匀地分割成几个不重叠的局部窗口。通过扩张采样,我们收集相同数量的键以获得长距离和多尺度信息。最后,通过整合来自不同窗口中对应位置采样点的特征来实现动态信息融合。基于MSR-SA模块,我们提出了一种用于医学图像分割的多尺度重构U-Net(MSR-UNet)框架。在Synapse和自动心脏诊断挑战赛(ACDC)数据集上的实验表明,MSR-UNet可以取得令人满意的分割结果。代码可在https://github.com/davidsmithwj/MSR-UNet获取(DOI:10.5281/zenodo.13969855)。