Li Guangju, Jin Dehu, Yu Qi, Zheng Yuanjie, Qi Meng
School of Information Science and Engineering, Shandong Normal University, Jinan, China.
Med Phys. 2024 Feb;51(2):1178-1189. doi: 10.1002/mp.16662. Epub 2023 Aug 1.
Accurate medical image segmentation is crucial for disease diagnosis and surgical planning. Transformer networks offer a promising alternative for medical image segmentation as they can learn global features through self-attention mechanisms. To further enhance performance, many researchers have incorporated more Transformer layers into their models. However, this approach often results in the model parameters increasing significantly, causing a potential rise in complexity. Moreover, the datasets of medical image segmentation usually have fewer samples, which leads to the risk of overfitting of the model.
This paper aims to design a medical image segmentation model that has fewer parameters and can effectively alleviate overfitting.
We design a MultiIB-Transformer structure consisting of a single Transformer layer and multiple information bottleneck (IB) blocks. The Transformer layer is used to capture long-distance spatial relationships to extract global feature information. The IB block is used to compress noise and improve model robustness. The advantage of this structure is that it only needs one Transformer layer to achieve the state-of-the-art (SOTA) performance, significantly reducing the number of model parameters. In addition, we designed a new skip connection structure. It only needs two 1× 1 convolutions, the high-resolution feature map can effectively have both semantic and spatial information, thereby alleviating the semantic gap.
The proposed model is on the Breast UltraSound Images (BUSI) dataset, and the IoU and F1 evaluation indicators are 67.75 and 87.78. On the Synapse multi-organ segmentation dataset, the Param, Hausdorff Distance (HD) and Dice Similarity Cofficient (DSC) evaluation indicators are 22.30, 20.04 and 81.83.
Our proposed model (MultiIB-TransUNet) achieved superior results with fewer parameters compared to other models.
准确的医学图像分割对于疾病诊断和手术规划至关重要。Transformer网络为医学图像分割提供了一种有前景的替代方案,因为它们可以通过自注意力机制学习全局特征。为了进一步提高性能,许多研究人员在其模型中加入了更多的Transformer层。然而,这种方法通常会导致模型参数显著增加,从而可能导致复杂度上升。此外,医学图像分割的数据集通常样本较少,这会导致模型出现过拟合的风险。
本文旨在设计一种参数较少且能有效缓解过拟合的医学图像分割模型。
我们设计了一种由单个Transformer层和多个信息瓶颈(IB)块组成的MultiIB-Transformer结构。Transformer层用于捕捉远距离空间关系以提取全局特征信息。IB块用于压缩噪声并提高模型鲁棒性。这种结构的优点是仅需一个Transformer层就能实现最优性能,显著减少了模型参数数量。此外,我们设计了一种新的跳跃连接结构。它仅需两个1×1卷积,高分辨率特征图就能有效地同时拥有语义和空间信息,从而缓解语义鸿沟。
所提出的模型在乳腺超声图像(BUSI)数据集上,交并比(IoU)和F1评估指标分别为67.75和87.78。在Synapse多器官分割数据集上,参数(Param)、豪斯多夫距离(HD)和骰子相似系数(DSC)评估指标分别为22.30、20.04和81.83。
与其他模型相比,我们提出的模型(MultiIB-TransUNet)以较少的参数取得了优异的结果。