Shen Yiqing, Li Jingxing, Shao Xinyuan, Romillo Blanca Inigo, Jindal Ankush, Dreizin David, Unberath Mathias
Johns Hopkins University, Baltimore, MD 21218, USA.
University of Maryland School of Medicine and R Adams Cowley Shock Trauma Center, Baltimore, MD 21201, USA.
Med Image Comput Comput Assist Interv. 2024 Oct;15012:542-552. doi: 10.1007/978-3-031-72390-2_51. Epub 2024 Oct 23.
Segment anything models (SAMs) are gaining attention for their zero-shot generalization capability in segmenting objects of unseen classes and in unseen domains when properly prompted. Interactivity is a key strength of SAMs, allowing users to iteratively provide prompts that specify objects of interest to refine outputs. However, to realize the interactive use of SAMs for 3D medical imaging tasks, rapid inference times are necessary. High memory requirements and long processing delays remain constraints that hinder the adoption of SAMs for this purpose. Specifically, while 2D SAMs applied to 3D volumes contend with repetitive computation to process all slices independently, 3D SAMs suffer from an exponential increase in model parameters and FLOPS. To address these challenges, we present FastSAM3D which accelerates SAM inference to 8 milliseconds per 128 × 128 × 128 3D volumetric image on an NVIDIA A100 GPU. This speedup is accomplished through 1) a novel layer-wise progressive distillation scheme that enables knowledge transfer from a complex 12-layer ViT-B to a lightweight 6-layer ViT-Tiny variant encoder without training from scratch; and 2) a novel 3D sparse flash attention to replace vanilla attention operators, substantially reducing memory needs and improving parallelization. Experiments on three diverse datasets reveal that FastSAM3D achieves a remarkable speedup of 527.38× compared to 2D SAMs and 8.75× compared to 3D SAMs on the same volumes without significant performance decline. Thus, FastSAM3D opens the door for low-cost truly interactive SAM-based 3D medical imaging segmentation with commonly used GPU hardware. Code is available at https://github.com/arcadelab/FastSAM3D.
分割一切模型(SAMs)因其在适当提示下对未见类别和未见领域中的对象进行分割的零样本泛化能力而受到关注。交互性是SAMs的一项关键优势,它允许用户迭代地提供指定感兴趣对象的提示,以优化输出。然而,要实现SAMs在3D医学成像任务中的交互使用,快速推理时间是必要的。高内存需求和长处理延迟仍然是阻碍为此目的采用SAMs的限制因素。具体而言,虽然应用于3D体积的2D SAMs要处理所有切片,存在重复计算的问题,而3D SAMs则面临模型参数和浮点运算次数(FLOPS)呈指数增长的问题。为应对这些挑战,我们提出了FastSAM3D,它在NVIDIA A100 GPU上,将对128×128×128的3D体积图像的SAM推理加速到每幅图像8毫秒。这种加速是通过以下方式实现的:1)一种新颖的逐层渐进式蒸馏方案,该方案能够在无需从头训练的情况下,将复杂的12层ViT-B中的知识转移到轻量级的6层ViT-Tiny变体编码器;2)一种新颖的3D稀疏闪存注意力机制,以取代普通注意力算子,大幅减少内存需求并提高并行化程度。在三个不同数据集上的实验表明,FastSAM3D与2D SAMs相比,在相同体积上实现了527.38倍的显著加速,与3D SAMs相比实现了8.75倍的加速,且性能没有显著下降。因此,FastSAM3D为使用常用GPU硬件进行低成本、基于SAM的真正交互式3D医学成像分割打开了大门。代码可在https://github.com/arcadelab/FastSAM3D获取。