Zhou Lei, Liu Huidong, Bae Joseph, He Junjun, Samaras Dimitris, Prasanna Prateek
Department of Computer Science, Stony Brook University, NY, USA.
Amazon, WA, USA.
Inf Process Med Imaging. 2023 Jun;13939:743-754. doi: 10.1007/978-3-031-34048-2_57. Epub 2023 Jun 8.
Although token sparsification has been applied to Vision Transformers (ViT) to accelerate classification, it is still unknown how to perform segmentation from sparse tokens. To this end, we reformulate segmentation as a → → (SCD) pipeline. We first empirically show that naïvely applying existing approaches from classification token pruning and masked image modeling (MIM) leads to failure and inefficient training caused by inappropriate sampling algorithms and the low quality of the restored dense features. In this paper, we propose and to address these problems. In , predicts token importance scores with a lightweight sub-network and samples the topK tokens. The intractable topK gradients are approximated through a continuous perturbed score distribution. In , restores a full token sequence by assembling both sparse output tokens and pruned multi-layer intermediate ones. The last stage is compatible with existing segmentation decoders, e.g., UNETR. Experiments show SCD pipelines equipped with and are much faster than baselines without token pruning in both training (up to 120% higher throughput) and inference (up to 60.6% higher throughput) while maintaining segmentation quality. Code is available here: https://github.com/cvlab-stonybrook/TokenSparse-for-MedSeg.
尽管令牌稀疏化已应用于视觉变换器(ViT)以加速分类,但如何从稀疏令牌执行分割仍不清楚。为此,我们将分割重新表述为一个→→(SCD)管道。我们首先通过实验表明,简单地应用来自分类令牌剪枝和掩码图像建模(MIM)的现有方法会由于不合适的采样算法和恢复的密集特征质量低而导致失败和低效训练。在本文中,我们提出和来解决这些问题。在中,通过一个轻量级子网预测令牌重要性分数并对前K个令牌进行采样。通过连续扰动分数分布近似难处理的前K个梯度。在中,通过组装稀疏输出令牌和剪枝的多层中间令牌来恢复完整的令牌序列。最后一个阶段与现有的分割解码器兼容,例如UNETR。实验表明,配备和的SCD管道在训练(吞吐量提高高达120%)和推理(吞吐量提高高达60.6%)方面比没有令牌剪枝的基线快得多,同时保持分割质量。代码可在此处获得:https://github.com/cvlab-stonybrook/TokenSparse-for-MedSeg 。