Zhang Chong, Wang Lingtong, Wei Guohui, Kong Zhiyong, Qiu Min
School of Intelligence and Information Engineering, Shandong University of Traditional Chinese Medicine, Jinan, China.
Department of Ultrasound Medicine, Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, China.
Front Physiol. 2024 Sep 27;15:1432987. doi: 10.3389/fphys.2024.1432987. eCollection 2024.
Ultrasound imaging has become a crucial tool in medical diagnostics, offering real-time visualization of internal organs and tissues. However, challenges such as low contrast, high noise levels, and variability in image quality hinder accurate interpretation. To enhance the diagnostic accuracy and support treatment decisions, precise segmentation of organs and lesions in ultrasound image is essential. Recently, several deep learning methods, including convolutional neural networks (CNNs) and Transformers, have reached significant milestones in medical image segmentation. Nonetheless, there remains a pressing need for methods capable of seamlessly integrating global context with local fine-grained information, particularly in addressing the unique challenges posed by ultrasound images.
In this paper, to address these issues, we propose DDTransUNet, a hybrid network combining Transformer and CNN, with a dual-branch encoder and dual attention mechanism for ultrasound image segmentation. DDTransUNet adopts a Swin Transformer branch and a CNN branch to extract global context and local fine-grained information. The dual attention comprising Global Spatial Attention (GSA) and Global Channel Attention (GCA) modules to capture long-range visual dependencies. A novel Cross Attention Fusion (CAF) module effectively fuses feature maps from both branches using cross-attention.
Experiments on three ultrasound image datasets demonstrate that DDTransUNet outperforms previous methods. In the TN3K dataset, DDTransUNet achieves IoU, Dice, HD95 and ACC metrics of 73.82%, 82.31%, 16.98 mm, and 96.94%, respectively. In the BUS-BRA dataset, DDTransUNet achieves 80.75%, 88.23%, 8.12 mm, and 98.00%. In the CAMUS dataset, DDTransUNet achieves 82.51%, 90.33%, 2.82 mm, and 96.87%.
These results indicate that our method can provide valuable diagnostic assistance to clinical practitioners.
超声成像已成为医学诊断中的关键工具,能够实时可视化内部器官和组织。然而,诸如对比度低、噪声水平高以及图像质量变化等挑战阻碍了准确的解读。为提高诊断准确性并支持治疗决策,对超声图像中的器官和病变进行精确分割至关重要。最近,包括卷积神经网络(CNN)和Transformer在内的几种深度学习方法在医学图像分割方面取得了重大进展。尽管如此,仍然迫切需要能够将全局上下文与局部细粒度信息无缝集成的方法,特别是在应对超声图像带来的独特挑战方面。
在本文中,为解决这些问题,我们提出了DDTransUNet,这是一种结合了Transformer和CNN的混合网络,具有用于超声图像分割的双分支编码器和双注意力机制。DDTransUNet采用一个Swin Transformer分支和一个CNN分支来提取全局上下文和局部细粒度信息。双注意力包括全局空间注意力(GSA)和全局通道注意力(GCA)模块,以捕捉远程视觉依赖性。一个新颖的交叉注意力融合(CAF)模块使用交叉注意力有效地融合来自两个分支的特征图。
在三个超声图像数据集上的实验表明,DDTransUNet优于先前的方法。在TN3K数据集中,DDTransUNet分别实现了73.82%、82.31%、16.98毫米和96.94%的交并比(IoU)、Dice系数、95% Hausdorff距离(HD95)和准确率(ACC)指标。在BUS - BRA数据集中,DDTransUNet实现了80.75%、88.23%、8.12毫米和98.00%。在CAMUS数据集中,DDTransUNet实现了82.51%、90.33%、2.82毫米和96.87%。
这些结果表明,我们的方法可以为临床医生提供有价值的诊断帮助。