Liu Yatong, Zhu Yu, Xin Ying, Zhang Yanan, Yang Dawei, Xu Tao
School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.
School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China; Shanghai Engineering Research Center of Internet of Things for Respiratory Medicine, Shanghai 200237, China.
Comput Methods Programs Biomed. 2023 May;233:107493. doi: 10.1016/j.cmpb.2023.107493. Epub 2023 Mar 17.
Transformers profiting from global information modeling derived from the self-attention mechanism have recently achieved remarkable performance in computer vision. In this study, a novel transformer-based medical image segmentation network called the multi-scale embedding spatial transformer (MESTrans) was proposed for medical image segmentation.
First, a dataset called COVID-DS36 was created from 4369 computed tomography (CT) images of 36 patients from a partner hospital, of which 18 had COVID-19 and 18 did not. Subsequently, a novel medical image segmentation network was proposed, which introduced a self-attention mechanism to improve the inherent limitation of convolutional neural networks (CNNs) and was capable of adaptively extracting discriminative information in both global and local content. Specifically, based on U-Net, a multi-scale embedding block (MEB) and multi-layer spatial attention transformer (SATrans) structure were designed, which can dynamically adjust the receptive field in accordance with the input content. The spatial relationship between multi-level and multi-scale image patches was modeled, and the global context information was captured effectively. To make the network concentrate on the salient feature region, a feature fusion module (FFM) was established, which performed global learning and soft selection between shallow and deep features, adaptively combining the encoder and decoder features. Four datasets comprising CT images, magnetic resonance (MR) images, and H&E-stained slide images were used to assess the performance of the proposed network.
Experiments were performed using four different types of medical image datasets. For the COVID-DS36 dataset, our method achieved a Dice similarity coefficient (DSC) of 81.23%. For the GlaS dataset, 89.95% DSC and 82.39% intersection over union (IoU) were obtained. On the Synapse dataset, the average DSC was 77.48% and the average Hausdorff distance (HD) was 31.69 mm. For the I2CVB dataset, 92.3% DSC and 85.8% IoU were obtained.
The experimental results demonstrate that the proposed model has an excellent generalization ability and outperforms other state-of-the-art methods. It is expected to be a potent tool to assist clinicians in auxiliary diagnosis and to promote the development of medical intelligence technology.
受益于自注意力机制衍生的全局信息建模的Transformer最近在计算机视觉领域取得了显著的性能。在本研究中,提出了一种名为多尺度嵌入空间Transformer(MESTrans)的基于Transformer的新型医学图像分割网络用于医学图像分割。
首先,从一家合作医院的36名患者的4369张计算机断层扫描(CT)图像中创建了一个名为COVID-DS36的数据集,其中18名患者患有COVID-19,18名未患。随后,提出了一种新型医学图像分割网络,该网络引入了自注意力机制以改善卷积神经网络(CNN)的固有局限性,并且能够在全局和局部内容中自适应地提取判别信息。具体而言,基于U-Net设计了多尺度嵌入块(MEB)和多层空间注意力Transformer(SATrans)结构,其可以根据输入内容动态调整感受野。对多级多尺度图像块之间的空间关系进行建模,并有效捕获全局上下文信息。为使网络专注于显著特征区域,建立了一个特征融合模块(FFM),其在浅层和深层特征之间进行全局学习和软选择,自适应地组合编码器和解码器特征。使用包含CT图像、磁共振(MR)图像和苏木精-伊红(H&E)染色切片图像的四个数据集来评估所提出网络的性能。
使用四种不同类型的医学图像数据集进行了实验。对于COVID-DS36数据集,我们的方法获得了81.23%的Dice相似系数(DSC)。对于GlaS数据集,获得了89.95%的DSC和82.39%的交并比(IoU)。在Synapse数据集上,平均DSC为77.48%,平均豪斯多夫距离(HD)为31.69毫米。对于I2CVB数据集,获得了92.3%的DSC和85.8%的IoU。
实验结果表明,所提出的模型具有出色的泛化能力,优于其他现有最先进的方法。有望成为辅助临床医生进行辅助诊断并推动医学智能技术发展的有力工具。