高效训练++：用于高效视觉主干训练的广义课程学习

EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training.

作者信息

Wang Yulin, Yue Yang, Lu Rui, Han Yizeng, Song Shiji, Huang Gao

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8036-8055. doi: 10.1109/TPAMI.2024.3401036. Epub 2024 Nov 6.

DOI:10.1109/TPAMI.2024.3401036

Abstract

The superior performance of modern computer vision backbones (e.g., vision Transformers learned on ImageNet-1 K/22 K) usually comes with a costly training procedure. This study contributes to this issue by generalizing the idea of curriculum learning beyond its original formulation, i.e., training models using easier-to-harder data. Specifically, we reformulate the training curriculum as a soft-selection function, which uncovers progressively more difficult patterns within each example during training, instead of performing easier-to-harder sample selection. Our work is inspired by an intriguing observation on the learning dynamics of visual backbones: during the earlier stages of training, the model predominantly learns to recognize some 'easier-to-learn' discriminative patterns in the data. These patterns, when observed through frequency and spatial domains, incorporate lower-frequency components, and the natural image contents without distortion or data augmentation. Motivated by these findings, we propose a curriculum where the model always leverages all the training data at every learning stage, yet the exposure to the 'easier-to-learn' patterns of each example is initiated first, with harder patterns gradually introduced as training progresses. To implement this idea in a computationally efficient way, we introduce a cropping operation in the Fourier spectrum of the inputs, enabling the model to learn from only the lower-frequency components. Then we show that exposing the contents of natural images can be readily achieved by modulating the intensity of data augmentation. Finally, we integrate these two aspects and design curriculum learning schedules by proposing tailored searching algorithms. Moreover, we present useful techniques for deploying our approach efficiently in challenging practical scenarios, such as large-scale parallel training, and limited input/output or data pre-processing speed. The resulting method, EfficientTrain++, is simple, general, yet surprisingly effective. As an off-the-shelf approach, it reduces the training time of various popular models (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, CSWin, and CAFormer) by [Formula: see text] on ImageNet-1 K/22 K without sacrificing accuracy. It also demonstrates efficacy in self-supervised learning (e.g., MAE).

摘要

现代计算机视觉主干网络（例如，在ImageNet-1K/22K上学习的视觉Transformer）的卓越性能通常伴随着昂贵的训练过程。本研究通过将课程学习的概念扩展到其原始形式之外，即使用从易到难的数据训练模型，来解决这个问题。具体来说，我们将训练课程重新表述为一个软选择函数，该函数在训练过程中逐步揭示每个示例中更难的模式，而不是进行从易到难的样本选择。我们的工作受到了关于视觉主干网络学习动态的一个有趣观察的启发：在训练的早期阶段，模型主要学习识别数据中一些“更容易学习”的判别模式。从频率和空间域观察这些模式时，它们包含低频分量以及无失真或数据增强的自然图像内容。受这些发现的启发，我们提出了一种课程，其中模型在每个学习阶段始终利用所有训练数据，但首先开始接触每个示例的“更容易学习”模式，随着训练的进行逐渐引入更难的模式。为了以计算高效的方式实现这个想法，我们在输入的傅里叶频谱中引入了裁剪操作，使模型能够仅从低频分量中学习。然后我们表明，通过调制数据增强的强度可以很容易地实现自然图像内容的曝光。最后，我们将这两个方面整合起来，并通过提出量身定制的搜索算法来设计课程学习时间表。此外，我们还提出了一些有用的技术，以便在具有挑战性的实际场景中有效地部署我们的方法，例如大规模并行训练以及有限的输入/输出或数据预处理速度。由此产生的方法EfficientTrain++简单、通用，但效果惊人。作为一种现成的方法，它在不牺牲准确性的情况下，将各种流行模型（例如ResNet、ConvNeXt、DeiT、PVT、Swin、CSWin和CAFormer）在ImageNet-1K/22K上的训练时间减少了[公式：见原文]。它在自监督学习（例如MAE）中也证明了有效性。

相似文献

EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training.高效训练++：用于高效视觉主干训练的广义课程学习

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8036-8055. doi: 10.1109/TPAMI.2024.3401036. Epub 2024 Nov 6.

Foundation models in gastrointestinal endoscopic AI: Impact of architecture, pre-training approach and data efficiency.胃肠道内镜 AI 中的基础模型：架构、预训练方法和数据效率的影响。

Med Image Anal. 2024 Dec;98:103298. doi: 10.1016/j.media.2024.103298. Epub 2024 Aug 12.

Contrastive Self-Supervised Pre-Training for Video Quality Assessment.基于对比自监督预训练的视频质量评估

IEEE Trans Image Process. 2022;31:458-471. doi: 10.1109/TIP.2021.3130536. Epub 2021 Dec 16.

Detecting floating litter in freshwater bodies with semi-supervised deep learning.利用半监督深度学习技术检测淡水体中的漂浮垃圾。

Water Res. 2024 Nov 15;266:122405. doi: 10.1016/j.watres.2024.122405. Epub 2024 Sep 11.

Detecting Glaucoma from Fundus Photographs Using Deep Learning without Convolutions: Transformer for Improved Generalization.使用不含卷积的深度学习从眼底照片中检测青光眼：用于提高泛化能力的Transformer

Ophthalmol Sci. 2022 Oct 19;3(1):100233. doi: 10.1016/j.xops.2022.100233. eCollection 2023 Mar.

Erratum: Eyestalk Ablation to Increase Ovarian Maturation in Mud Crabs.勘误：切除眼柄以增加泥蟹的卵巢成熟度。

J Vis Exp. 2023 May 26(195). doi: 10.3791/6561.

Deep convolutional neural network and IoT technology for healthcare.用于医疗保健的深度卷积神经网络和物联网技术。

Digit Health. 2024 Jan 17;10:20552076231220123. doi: 10.1177/20552076231220123. eCollection 2024 Jan-Dec.

On the Importance of Attention and Augmentations for Hypothesis Transfer in Domain Adaptation and Generalization.注意力和增强在域适应与泛化中假设转移的重要性

Sensors (Basel). 2023 Oct 12;23(20):8409. doi: 10.3390/s23208409.

A Survey on Curriculum Learning.课程学习调查

IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):4555-4576. doi: 10.1109/TPAMI.2021.3069908. Epub 2022 Aug 4.

UniFormer: Unifying Convolution and Self-Attention for Visual Recognition.统一卷积与自注意力机制用于视觉识别的UniFormer

IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12581-12600. doi: 10.1109/TPAMI.2023.3282631. Epub 2023 Sep 5.

引用本文的文献

A framework for measuring the training efficiency of a neural architecture.一种用于衡量神经架构训练效率的框架。

Artif Intell Rev. 2024;57(12):349. doi: 10.1007/s10462-024-10943-8. Epub 2024 Oct 28.

高效训练++：用于高效视觉主干训练的广义课程学习

EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training.

作者信息

Wang Yulin, Yue Yang, Lu Rui, Han Yizeng, Song Shiji, Huang Gao

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8036-8055. doi: 10.1109/TPAMI.2024.3401036. Epub 2024 Nov 6.

DOI:10.1109/TPAMI.2024.3401036

PMID:38743547

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

高效训练++：用于高效视觉主干训练的广义课程学习

EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training.

作者信息

出版信息

相似文献

引用本文的文献

高效训练++：用于高效视觉主干训练的广义课程学习

EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training.

作者信息

出版信息

相似文献

引用本文的文献