IEEE Trans Med Imaging. 2024 Mar;43(3):994-1005. doi: 10.1109/TMI.2023.3326188. Epub 2024 Mar 5.
Hybrid transformer-based segmentation approaches have shown great promise in medical image analysis. However, they typically require considerable computational power and resources during both training and inference stages, posing a challenge for resource-limited medical applications common in the field. To address this issue, we present an innovative framework called Slim UNETR, designed to achieve a balance between accuracy and efficiency by leveraging the advantages of both convolutional neural networks and transformers. Our method features the Slim UNETR Block as a core component, which effectively enables information exchange through self-attention mechanism decomposition and cost-effective representation aggregation. Additionally, we utilize the throughput metric as an efficiency indicator to provide feedback on model resource consumption. Our experiments demonstrate that Slim UNETR outperforms state-of-the-art models in terms of accuracy, model size, and efficiency when deployed on resource-constrained devices. Remarkably, Slim UNETR achieves 92.44% dice accuracy on BraTS2021 while being 34.6x smaller and 13.4x faster during inference compared to Swin UNETR. Code: https://github.com/aigzhusmart/Slim-UNETR.
基于混合变压器的分割方法在医学图像分析中显示出巨大的潜力。然而,它们通常在训练和推理阶段都需要相当大的计算能力和资源,这对于医学领域中常见的资源有限的应用程序来说是一个挑战。为了解决这个问题,我们提出了一个名为 Slim UNETR 的创新框架,旨在通过利用卷积神经网络和变压器的优势来实现准确性和效率之间的平衡。我们的方法的核心组件是 Slim UNETR 块,它通过自注意力机制分解和具有成本效益的表示聚合有效地实现了信息交换。此外,我们还利用吞吐量指标作为效率指标,为模型资源消耗提供反馈。我们的实验表明,在资源受限的设备上部署时,Slim UNETR 在准确性、模型大小和效率方面优于最先进的模型。值得注意的是,Slim UNETR 在 BraTS2021 上实现了 92.44%的骰子准确率,而在推理过程中,它的大小要小 34.6 倍,速度要快 13.4 倍,比 Swin UNETR 快。代码:https://github.com/aigzhusmart/Slim-UNETR。