Qiong Liu, Chaofan Li, Jinnan Teng, Liping Chen, Jianxiang Song
School of Medical Imaging, Jiangsu Medical College, Yancheng, 224005, Jiangsu, China.
Affiliated Hospital 6 of Nantong University, Yancheng Third People's Hospital, Yancheng, 224001, Jiangsu, China.
Sci Rep. 2025 Jan 22;15(1):2833. doi: 10.1038/s41598-025-86315-1.
Convolutional Neural Networks (CNNs) have achieved remarkable segmentation accuracy in medical image segmentation tasks. However, the Vision Transformer (ViT) model, with its capability of extracting global information, offers a significant advantage in contextual information compared to the limited receptive field of convolutional kernels in CNNs. Despite this, ViT models struggle to fully detect and extract high-frequency signals, such as textures and boundaries, in medical images. These high-frequency features are essential in medical imaging, as targets like tumors and pathological organs exhibit significant differences in texture and boundaries across different stages. Additionally, the high resolution of medical images leads to computational complexity in the self-attention mechanism of ViTs. To address these limitations, we propose a medical image segmentation network framework based on frequency domain decomposition using a Laplacian pyramid. This approach selectively computes attention features for high-frequency signals in the original image to enhance spatial structural information effectively. During attention feature computation, we introduce Singular Value Decomposition (SVD) to extract an effective representation matrix from the original image, which is then applied in the attention computation process for linear projection. This method reduces computational complexity while preserving essential features. We demonstrated the segmentation validity and superiority of our model on the Abdominal Multi-Organ Segmentation dataset and the Dermatological Disease dataset, and on the Synapse dataset our model achieved a score of 82.68 on the Dice metrics and 17.23 mm on the HD metrics. Experimental results indicate that our model consistently exhibits segmentation effectiveness and improved accuracy across various datasets.
卷积神经网络(CNNs)在医学图像分割任务中取得了显著的分割精度。然而,视觉Transformer(ViT)模型凭借其提取全局信息的能力,与CNNs中卷积核有限的感受野相比,在上下文信息方面具有显著优势。尽管如此,ViT模型在医学图像中难以完全检测和提取高频信号,如图像纹理和边界。这些高频特征在医学成像中至关重要,因为肿瘤和病理器官等目标在不同阶段的纹理和边界存在显著差异。此外,医学图像的高分辨率导致ViT自注意力机制的计算复杂度增加。为了解决这些限制,我们提出了一种基于拉普拉斯金字塔频域分解的医学图像分割网络框架。该方法选择性地计算原始图像中高频信号的注意力特征,以有效增强空间结构信息。在注意力特征计算过程中,我们引入奇异值分解(SVD)从原始图像中提取有效表示矩阵,然后将其应用于注意力计算过程进行线性投影。该方法在保留基本特征的同时降低了计算复杂度。我们在腹部多器官分割数据集和皮肤病数据集上验证了模型的分割有效性和优越性,在Synapse数据集上,我们的模型在Dice指标上得分为82.68,在HD指标上为17.23毫米。实验结果表明,我们的模型在各个数据集上始终表现出分割有效性和更高的准确性。