IEEE Trans Pattern Anal Mach Intell. 2023 Apr;45(4):5314-5321. doi: 10.1109/TPAMI.2022.3206148. Epub 2023 Mar 7.
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We also train ResMLP models in a self-supervised setup, to further remove priors from employing a labelled dataset. Finally, by adapting our model to machine translation we achieve surprisingly good results. We share pre-trained models and our code based on the Timm library.
我们提出了 ResMLP,这是一种完全基于多层感知机的图像分类架构。它是一种简单的残差网络,交替使用(i)线性层,图像补丁在通道中独立且相同地相互作用,以及(ii)两层前馈网络,其中每个补丁的通道独立相互作用。当使用现代训练策略进行训练时,该策略使用大量数据增强和可选的蒸馏,它在 ImageNet 上实现了惊人的准确性/复杂度权衡。我们还在自监督设置中训练 ResMLP 模型,以进一步从使用标记数据集的过程中去除先验。最后,通过将我们的模型应用于机器翻译,我们取得了惊人的效果。我们基于 Timm 库共享预训练模型和代码。