Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, School of Medicine and Engineering, No.37 Xueyuan Road, Haidian District, Beijing, China; Key Laboratory of Big Data-Based Precision Medicine, Ministry of Industry and Information Technology, No.37 Xueyuan Road, Haidian District, Beijing, China; School of Automation Science and Electrical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing, China.
Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, School of Medicine and Engineering, No.37 Xueyuan Road, Haidian District, Beijing, China; Key Laboratory of Big Data-Based Precision Medicine, Ministry of Industry and Information Technology, No.37 Xueyuan Road, Haidian District, Beijing, China; School of Automation Science and Electrical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing, China.
Comput Methods Programs Biomed. 2023 Mar;230:107348. doi: 10.1016/j.cmpb.2023.107348. Epub 2023 Jan 12.
COVID-19 is a serious threat to human health. Traditional convolutional neural networks (CNNs) can realize medical image segmentation, whilst transformers can be used to perform machine vision tasks, because they have a better ability to capture long-range relationships than CNNs. The combination of CNN and transformers to complete the task of semantic segmentation has attracted intense research. Currently, it is challenging to segment medical images on limited data sets like that on COVID-19.
This study proposes a lightweight transformer+CNN model, in which the encoder sub-network is a two-path design that enables both the global dependence of image features and the low layer spatial details to be effectively captured. Using CNN and MobileViT to jointly extract image features reduces the amount of computation and complexity of the model as well as improves the segmentation performance. So this model is titled Mini-MobileViT-Seg (MMViT-Seg). In addition, a multi query attention (MQA) module is proposed to fuse the multi-scale features from different levels of decoder sub-network, further improving the performance of the model. MQA can simultaneously fuse multi-input, multi-scale low-level feature maps and high-level feature maps as well as conduct end-to-end supervised learning guided by ground truth.
The two-class infection labeling experiments were conducted based on three datasets. The final results show that the proposed model has the best performance and the minimum number of parameters among five popular semantic segmentation algorithms. In multi-class infection labeling results, the proposed model also achieved competitive performance.
The proposed MMViT-Seg is tested on three COVID-19 segmentation datasets, with results showing that this model has better performance than other models. In addition, the proposed MQA module, which can effectively fuse multi-scale features of different levels further improves the segmentation accuracy.
COVID-19 对人类健康构成严重威胁。传统的卷积神经网络(CNN)可以实现医学图像分割,而变压器可以用于执行机器视觉任务,因为它们比 CNN 具有更好的捕获长程关系的能力。将 CNN 和变压器结合起来完成语义分割任务引起了人们的浓厚兴趣。目前,在 COVID-19 等有限的数据集上分割医学图像具有挑战性。
本研究提出了一种轻量级的变压器+CNN 模型,其中编码器子网络采用双路径设计,能够有效地捕获图像特征的全局依赖性和低层次空间细节。使用 CNN 和 MobileViT 联合提取图像特征可以减少计算量和模型的复杂性,并提高分割性能。因此,该模型被命名为 Mini-MobileViT-Seg(MMViT-Seg)。此外,提出了一种多查询注意力(MQA)模块,用于融合来自解码器子网络不同层次的多尺度特征,进一步提高模型的性能。MQA 可以同时融合多输入、多尺度的低层次特征图和高层次特征图,并在地面实况的指导下进行端到端的监督学习。
基于三个数据集进行了两类感染标记实验。最终结果表明,在所提出的模型中,五个流行的语义分割算法具有最佳的性能和最小的参数数量。在多类感染标记结果中,所提出的模型也取得了有竞争力的性能。
在所提出的 MMViT-Seg 上进行了三个 COVID-19 分割数据集的测试,结果表明该模型的性能优于其他模型。此外,所提出的 MQA 模块可以有效地融合不同层次的多尺度特征,进一步提高了分割精度。