• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

高效训练++:用于高效视觉主干训练的广义课程学习

EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training.

作者信息

Wang Yulin, Yue Yang, Lu Rui, Han Yizeng, Song Shiji, Huang Gao

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8036-8055. doi: 10.1109/TPAMI.2024.3401036. Epub 2024 Nov 6.

DOI:10.1109/TPAMI.2024.3401036
PMID:38743547
Abstract

The superior performance of modern computer vision backbones (e.g., vision Transformers learned on ImageNet-1 K/22 K) usually comes with a costly training procedure. This study contributes to this issue by generalizing the idea of curriculum learning beyond its original formulation, i.e., training models using easier-to-harder data. Specifically, we reformulate the training curriculum as a soft-selection function, which uncovers progressively more difficult patterns within each example during training, instead of performing easier-to-harder sample selection. Our work is inspired by an intriguing observation on the learning dynamics of visual backbones: during the earlier stages of training, the model predominantly learns to recognize some 'easier-to-learn' discriminative patterns in the data. These patterns, when observed through frequency and spatial domains, incorporate lower-frequency components, and the natural image contents without distortion or data augmentation. Motivated by these findings, we propose a curriculum where the model always leverages all the training data at every learning stage, yet the exposure to the 'easier-to-learn' patterns of each example is initiated first, with harder patterns gradually introduced as training progresses. To implement this idea in a computationally efficient way, we introduce a cropping operation in the Fourier spectrum of the inputs, enabling the model to learn from only the lower-frequency components. Then we show that exposing the contents of natural images can be readily achieved by modulating the intensity of data augmentation. Finally, we integrate these two aspects and design curriculum learning schedules by proposing tailored searching algorithms. Moreover, we present useful techniques for deploying our approach efficiently in challenging practical scenarios, such as large-scale parallel training, and limited input/output or data pre-processing speed. The resulting method, EfficientTrain++, is simple, general, yet surprisingly effective. As an off-the-shelf approach, it reduces the training time of various popular models (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, CSWin, and CAFormer) by [Formula: see text] on ImageNet-1 K/22 K without sacrificing accuracy. It also demonstrates efficacy in self-supervised learning (e.g., MAE).

摘要

现代计算机视觉主干网络(例如,在ImageNet-1K/22K上学习的视觉Transformer)的卓越性能通常伴随着昂贵的训练过程。本研究通过将课程学习的概念扩展到其原始形式之外,即使用从易到难的数据训练模型,来解决这个问题。具体来说,我们将训练课程重新表述为一个软选择函数,该函数在训练过程中逐步揭示每个示例中更难的模式,而不是进行从易到难的样本选择。我们的工作受到了关于视觉主干网络学习动态的一个有趣观察的启发:在训练的早期阶段,模型主要学习识别数据中一些“更容易学习”的判别模式。从频率和空间域观察这些模式时,它们包含低频分量以及无失真或数据增强的自然图像内容。受这些发现的启发,我们提出了一种课程,其中模型在每个学习阶段始终利用所有训练数据,但首先开始接触每个示例的“更容易学习”模式,随着训练的进行逐渐引入更难的模式。为了以计算高效的方式实现这个想法,我们在输入的傅里叶频谱中引入了裁剪操作,使模型能够仅从低频分量中学习。然后我们表明,通过调制数据增强的强度可以很容易地实现自然图像内容的曝光。最后,我们将这两个方面整合起来,并通过提出量身定制的搜索算法来设计课程学习时间表。此外,我们还提出了一些有用的技术,以便在具有挑战性的实际场景中有效地部署我们的方法,例如大规模并行训练以及有限的输入/输出或数据预处理速度。由此产生的方法EfficientTrain++简单、通用,但效果惊人。作为一种现成的方法,它在不牺牲准确性的情况下,将各种流行模型(例如ResNet、ConvNeXt、DeiT、PVT、Swin、CSWin和CAFormer)在ImageNet-1K/22K上的训练时间减少了[公式:见原文]。它在自监督学习(例如MAE)中也证明了有效性。

相似文献

1
EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training.高效训练++:用于高效视觉主干训练的广义课程学习
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8036-8055. doi: 10.1109/TPAMI.2024.3401036. Epub 2024 Nov 6.
2
Foundation models in gastrointestinal endoscopic AI: Impact of architecture, pre-training approach and data efficiency.胃肠道内镜 AI 中的基础模型:架构、预训练方法和数据效率的影响。
Med Image Anal. 2024 Dec;98:103298. doi: 10.1016/j.media.2024.103298. Epub 2024 Aug 12.
3
Contrastive Self-Supervised Pre-Training for Video Quality Assessment.基于对比自监督预训练的视频质量评估
IEEE Trans Image Process. 2022;31:458-471. doi: 10.1109/TIP.2021.3130536. Epub 2021 Dec 16.
4
Detecting floating litter in freshwater bodies with semi-supervised deep learning.利用半监督深度学习技术检测淡水体中的漂浮垃圾。
Water Res. 2024 Nov 15;266:122405. doi: 10.1016/j.watres.2024.122405. Epub 2024 Sep 11.
5
Detecting Glaucoma from Fundus Photographs Using Deep Learning without Convolutions: Transformer for Improved Generalization.使用不含卷积的深度学习从眼底照片中检测青光眼:用于提高泛化能力的Transformer
Ophthalmol Sci. 2022 Oct 19;3(1):100233. doi: 10.1016/j.xops.2022.100233. eCollection 2023 Mar.
6
Erratum: Eyestalk Ablation to Increase Ovarian Maturation in Mud Crabs.勘误:切除眼柄以增加泥蟹的卵巢成熟度。
J Vis Exp. 2023 May 26(195). doi: 10.3791/6561.
7
Deep convolutional neural network and IoT technology for healthcare.用于医疗保健的深度卷积神经网络和物联网技术。
Digit Health. 2024 Jan 17;10:20552076231220123. doi: 10.1177/20552076231220123. eCollection 2024 Jan-Dec.
8
On the Importance of Attention and Augmentations for Hypothesis Transfer in Domain Adaptation and Generalization.注意力和增强在域适应与泛化中假设转移的重要性
Sensors (Basel). 2023 Oct 12;23(20):8409. doi: 10.3390/s23208409.
9
A Survey on Curriculum Learning.课程学习调查
IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):4555-4576. doi: 10.1109/TPAMI.2021.3069908. Epub 2022 Aug 4.
10
UniFormer: Unifying Convolution and Self-Attention for Visual Recognition.统一卷积与自注意力机制用于视觉识别的UniFormer
IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12581-12600. doi: 10.1109/TPAMI.2023.3282631. Epub 2023 Sep 5.

引用本文的文献

1
A framework for measuring the training efficiency of a neural architecture.一种用于衡量神经架构训练效率的框架。
Artif Intell Rev. 2024;57(12):349. doi: 10.1007/s10462-024-10943-8. Epub 2024 Oct 28.