Wu Hanqiong, Qu Gangrong, Xiao Zhifeng, Chunyu Fan
Internal Medicine, The First Hospital of Jinzhou Medical University, Jinzhou, 121001, China.
Cardiovascular Medicine, Chongqing General Hospital of the Armed Police Force, Chongqing, 400061, China.
Heliyon. 2024 Jul 17;10(15):e34845. doi: 10.1016/j.heliyon.2024.e34845. eCollection 2024 Aug 15.
Echocardiography is a key tool for the diagnosis of cardiac diseases, and accurate left ventricular (LV) segmentation in echocardiographic videos is crucial for the assessment of cardiac function. However, since semantic segmentation of video needs to take into account the temporal correlation between frames, this makes the task very challenging. This article introduces an innovative method that incorporates a modified mixed attention mechanism into the SegFormer architecture, enabling it to effectively grasp the temporal correlation present in video data. The proposed method processes each time series by encoding the image input into the encoder to obtain the current time feature map. This map, along with the historical time feature map, is then fed into a time-sensitive mixed attention mechanism type of convolution block attention module (TCBAM). Its output can serve as the historical time feature map for the subsequent sequence, and a combination of the current time feature map and historical time feature map for the current sequence. The processed feature map is then input into the Multilayer Perceptron (MLP) and subsequent networks to generate the final segmented image. Through extensive experiments conducted on two different datasets: Hamad Medical Corporation, Tampere University, and Qatar University (HMC-QU), Cardiac Acquisitions for Multi-structure Ultrasound Segmentation (CAMUS) and Sunnybrook Cardiac Data (SCD), achieving a Dice coefficient of 97.92 % on the SCD dataset and an F1 score of 0.9263 on the CAMUS dataset, outperforming all other models. This research provides a promising solution to the temporal modeling challenge in video semantic segmentation tasks using transformer-based models and points out a promising direction for future research in this field.
超声心动图是诊断心脏疾病的关键工具,而在超声心动图视频中准确进行左心室(LV)分割对于评估心脏功能至关重要。然而,由于视频的语义分割需要考虑帧之间的时间相关性,这使得该任务极具挑战性。本文介绍了一种创新方法,该方法将改进的混合注意力机制融入SegFormer架构,使其能够有效把握视频数据中存在的时间相关性。所提出的方法通过将图像输入编码到编码器中来处理每个时间序列,以获得当前时间特征图。然后,将此图与历史时间特征图一起输入到一种对时间敏感的混合注意力机制类型的卷积块注意力模块(TCBAM)中。其输出可作为后续序列的历史时间特征图,以及当前序列的当前时间特征图与历史时间特征图的组合。然后将处理后的特征图输入到多层感知器(MLP)和后续网络中,以生成最终的分割图像。通过在两个不同数据集上进行的广泛实验:哈马德医疗公司、坦佩雷大学和卡塔尔大学(HMC - QU)、用于多结构超声分割的心脏采集(CAMUS)以及桑尼布鲁克心脏数据(SCD),在SCD数据集上实现了97.92%的Dice系数,在CAMUS数据集上实现了0.9263的F1分数,优于所有其他模型。本研究为使用基于Transformer的模型解决视频语义分割任务中的时间建模挑战提供了一个有前景的解决方案,并为该领域未来的研究指出了一个有前景的方向。