Zhang Yuanhan, Zhou Kaiyang, Liu Ziwei
IEEE Trans Pattern Anal Mach Intell. 2025 Jul;47(7):5268-5280. doi: 10.1109/TPAMI.2024.3435939.
The size of vision models has grown exponentially over the last few years, especially after the emergence of Vision Transformer. This has motivated the development of parameter-efficient tuning methods, such as learning adapter layers or visual prompt tokens, which allow a tiny portion of model parameters to be trained whereas the vast majority obtained from pre-training are frozen. However, designing a proper tuning method is non-trivial: one might need to try out a lengthy list of design choices, not to mention that each downstream dataset often requires custom designs. In this paper, we view the existing parameter-efficient tuning methods as "prompt modules" and propose Neural prOmpt seArcH (NOAH), a novel approach that learns, for large vision models, the optimal design of prompt modules through a neural architecture search algorithm, specifically for each downstream dataset. By conducting extensive experiments on over 20 vision datasets, we demonstrate that NOAH (i) is superior to individual prompt modules, (ii) has good few-shot learning ability, and (iii) is domain-generalizable.
在过去几年中,视觉模型的规模呈指数级增长,尤其是在视觉Transformer出现之后。这推动了参数高效调整方法的发展,例如学习适配器层或视觉提示令牌,这些方法允许只训练一小部分模型参数,而从预训练中获得的绝大多数参数则保持冻结状态。然而,设计一种合适的调整方法并非易事:可能需要尝试一长串的设计选择,更不用说每个下游数据集通常都需要定制设计了。在本文中,我们将现有的参数高效调整方法视为“提示模块”,并提出了神经提示搜索(NOAH),这是一种新颖的方法,它通过神经架构搜索算法为大型视觉模型学习提示模块的最优设计,特别是针对每个下游数据集。通过在20多个视觉数据集上进行广泛的实验,我们证明了NOAH(i)优于单个提示模块,(ii)具有良好的少样本学习能力,以及(iii)具有领域通用性。