Xiao Xin, Dong Suyu, Yu Yang, Li Yan, Yang Guangyuan, Qiu Zhaowen
College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.
Department of Cardiovascular Surgery, Beijing Anzhen Hospital, Capital Medical University, Beijing, China.
Front Med (Lausanne). 2023 Mar 9;10:1114571. doi: 10.3389/fmed.2023.1114571. eCollection 2023.
The heart is a relatively complex non-rigid motion organ in the human body. Quantitative motion analysis of the heart takes on a critical significance to help doctors with accurate diagnosis and treatment. Moreover, cardiovascular magnetic resonance imaging (CMRI) can be used to perform a more detailed quantitative analysis evaluation for cardiac diagnosis. Deformable image registration (DIR) has become a vital task in biomedical image analysis since tissue structures have variability in medical images. Recently, the model based on masked autoencoder (MAE) has recently been shown to be effective in computer vision tasks. Vision Transformer has the context aggregation ability to restore the semantic information in the original image regions by using a low proportion of visible image patches to predict the masked image patches. A novel Transformer-ConvNet architecture is proposed in this study based on MAE for medical image registration. The core of the Transformer is designed as a masked autoencoder (MAE) and a lightweight decoder structure, and feature extraction before the downstream registration task is transformed into the self-supervised learning task. This study also rethinks the calculation method of the multi-head self-attention mechanism in the Transformer encoder. We improve the query-key-value-based dot product attention by introducing both depthwise separable convolution (DWSC) and squeeze and excitation (SE) modules into the self-attention module to reduce the amount of parameter computation to highlight image details and maintain high spatial resolution image features. In addition, concurrent spatial and channel squeeze and excitation (scSE) module is embedded into the CNN structure, which also proves to be effective for extracting robust feature representations. The proposed method, called MAE-TransRNet, has better generalization. The proposed model is evaluated on the cardiac short-axis public dataset (with images and labels) at the 2017 Automated Cardiac Diagnosis Challenge (ACDC). The relevant qualitative and quantitative results (e.g., dice performance and Hausdorff distance) suggest that the proposed model can achieve superior results over those achieved by the state-of-the-art methods, thus proving that MAE and improved self-attention are more effective and promising for medical image registration tasks. Codes and models are available at https://github.com/XinXiao101/MAE-TransRNet.
心脏是人体中一个相对复杂的非刚性运动器官。心脏的定量运动分析对于帮助医生进行准确诊断和治疗具有至关重要的意义。此外,心血管磁共振成像(CMRI)可用于对心脏诊断进行更详细的定量分析评估。由于医学图像中的组织结构具有变异性,可变形图像配准(DIR)已成为生物医学图像分析中的一项重要任务。最近,基于掩码自动编码器(MAE)的模型在计算机视觉任务中已被证明是有效的。视觉Transformer具有上下文聚合能力,通过使用低比例的可见图像块来预测掩码图像块,从而恢复原始图像区域中的语义信息。本研究基于MAE提出了一种用于医学图像配准的新型Transformer-ConvNet架构。Transformer的核心设计为掩码自动编码器(MAE)和轻量级解码器结构,并将下游配准任务之前的特征提取转换为自监督学习任务。本研究还重新思考了Transformer编码器中多头自注意力机制的计算方法。我们通过将深度可分离卷积(DWSC)和挤压与激励(SE)模块引入自注意力模块来改进基于查询-键-值的点积注意力,以减少参数计算量,突出图像细节并保持高空间分辨率图像特征。此外,并发空间和通道挤压与激励(scSE)模块被嵌入到CNN结构中,这也被证明对于提取鲁棒的特征表示是有效的。所提出的方法称为MAE-TransRNet,具有更好的泛化能力。所提出的模型在2017年自动心脏诊断挑战赛(ACDC)的心脏短轴公共数据集(有图像和标签)上进行了评估。相关的定性和定量结果(例如,骰子性能和豪斯多夫距离)表明,所提出的模型能够取得优于现有方法的结果,从而证明MAE和改进的自注意力对于医学图像配准任务更有效且更具前景。代码和模型可在https://github.com/XinXiao101/MAE-TransRNet获取。