Interventional Imaging Laboratory (LIVE), Software and IT Engineering Department, École de technologie supérieure, 1100 Notre-Dame Street West, Montreal, Quebec, Canada H3C 1K3, Canada.
Biomed Phys Eng Express. 2024 Sep 13;10(6). doi: 10.1088/2057-1976/ad7594.
Echocardiography is one the most commonly used imaging modalities for the diagnosis of congenital heart disease. Echocardiographic image analysis is crucial to obtaining accurate cardiac anatomy information. Semantic segmentation models can be used to precisely delimit the borders of the left ventricle, and allow an accurate and automatic identification of the region of interest, which can be extremely useful for cardiologists. In the field of computer vision, convolutional neural network (CNN) architectures remain dominant. Existing CNN approaches have proved highly efficient for the segmentation of various medical images over the past decade. However, these solutions usually struggle to capture long-range dependencies, especially when it comes to images with objects of different scales and complex structures. In this study, we present an efficient method for semantic segmentation of echocardiographic images that overcomes these challenges by leveraging the self-attention mechanism of the Transformer architecture. The proposed solution extracts long-range dependencies and efficiently processes objects at different scales, improving performance in a variety of tasks. We introduce Shifted Windows Transformer models (Swin Transformers), which encode both the content of anatomical structures and the relationship between them. Our solution combines the Swin Transformer and U-Net architectures, producing a U-shaped variant. The validation of the proposed method is performed with the EchoNet-Dynamic dataset used to train our model. The results show an accuracy of 0.97, a Dice coefficient of 0.87, and an Intersection over union (IoU) of 0.78. Swin Transformer models are promising for semantically segmenting echocardiographic images and may help assist cardiologists in automatically analyzing and measuring complex echocardiographic images.
超声心动图是诊断先天性心脏病最常用的成像方式之一。超声心动图图像分析对于获取准确的心脏解剖信息至关重要。语义分割模型可用于精确划定左心室的边界,并允许对感兴趣区域进行准确和自动识别,这对心脏病专家非常有用。在计算机视觉领域,卷积神经网络(CNN)架构仍然占据主导地位。在过去十年中,现有的 CNN 方法已被证明非常高效,可以分割各种医学图像。然而,这些解决方案通常难以捕捉长程依赖关系,特别是在涉及具有不同比例和复杂结构的对象的图像时。在这项研究中,我们提出了一种有效的超声心动图图像语义分割方法,通过利用 Transformer 架构的自注意力机制来克服这些挑战。所提出的解决方案提取长程依赖关系,并有效地处理不同比例的对象,从而在各种任务中提高性能。我们引入了 Shifted Windows Transformer 模型(Swin Transformers),它可以编码解剖结构的内容和它们之间的关系。我们的解决方案将 Swin Transformer 和 U-Net 架构相结合,形成 U 形变体。通过使用 EchoNet-Dynamic 数据集对所提出的方法进行验证,该数据集用于训练我们的模型。结果表明,该方法的准确率为 0.97,Dice 系数为 0.87,交并比(IoU)为 0.78。Swin Transformer 模型在对超声心动图图像进行语义分割方面具有很大的潜力,可能有助于心脏病专家自动分析和测量复杂的超声心动图图像。