DILS：深度增量学习策略。

DILS: depth incremental learning strategy.

作者信息

Wang Yanmei, Han Zhi, Yu Siquan, Zhang Shaojie, Liu Baichen, Fan Huijie

机构信息

State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China.

Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang, China.

出版信息

Front Neurorobot. 2024 Jan 8;17:1337130. doi: 10.3389/fnbot.2023.1337130. eCollection 2023.

DOI:10.3389/fnbot.2023.1337130

PMID:38260719

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10800709/

Abstract

There exist various methods for transferring knowledge between neural networks, such as parameter transfer, feature sharing, and knowledge distillation. However, these methods are typically applied when transferring knowledge between networks of equal size or from larger networks to smaller ones. Currently, there is a lack of methods for transferring knowledge from shallower networks to deeper ones, which is crucial in real-world scenarios such as system upgrades where network size increases for better performance. End-to-end training is the commonly used method for network training. However, in this training strategy, the deeper network cannot inherit the knowledge from the existing shallower network. As a result, not only is the flexibility of the network limited but there is also a significant waste of computing power and time. Therefore, it is imperative to develop new methods that enable the transfer of knowledge from shallower to deeper networks. To address the aforementioned issue, we propose an depth incremental learning strategy (DILS). It starts from a shallower net and deepens the net gradually by inserting new layers each time until reaching requested performance. We also derive an analytical method and a network approximation method for training new added parameters to guarantee the new deeper net can inherit the knowledge learned by the old shallower net. It enables knowledge transfer from smaller to larger networks and provides good initialization of layers in the larger network to stabilize the performance of large models and accelerate their training process. Its reasonability can be guaranteed by information projection theory and is verified by a series of synthetic and real-data experiments.

摘要

神经网络之间存在多种知识转移方法，如参数转移、特征共享和知识蒸馏。然而，这些方法通常用于在大小相等的网络之间或从较大网络向较小网络转移知识时。目前，缺乏将知识从较浅网络转移到较深网络的方法，而这在诸如系统升级等现实场景中至关重要，因为在这些场景中网络规模会增大以获得更好的性能。端到端训练是网络训练常用的方法。然而，在这种训练策略中，更深的网络无法继承现有较浅网络的知识。结果，不仅网络的灵活性受到限制，而且还存在大量计算能力和时间的浪费。因此，开发能够将知识从较浅网络转移到较深网络的新方法势在必行。为了解决上述问题，我们提出了一种深度增量学习策略（DILS）。它从一个较浅的网络开始，每次通过插入新层逐渐加深网络，直到达到所需性能。我们还推导了一种解析方法和一种网络近似方法来训练新添加的参数，以确保新的更深网络能够继承旧的较浅网络学到的知识。它能够实现从较小网络到较大网络的知识转移，并为较大网络中的层提供良好的初始化，以稳定大型模型的性能并加速其训练过程。其合理性可以通过信息投影理论得到保证，并通过一系列合成数据和真实数据实验得到验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/374e/10800709/10522ec5804d/fnbot-17-1337130-g0001.jpg

相似文献

DILS: depth incremental learning strategy.

Front Neurorobot. 2024 Jan 8;17:1337130. doi: 10.3389/fnbot.2023.1337130. eCollection 2023.

Knowledge Distillation in Histology Landscape by Multi-Layer Features Supervision.

IEEE J Biomed Health Inform. 2023 Apr;27(4):2037-2046. doi: 10.1109/JBHI.2023.3237749. Epub 2023 Apr 4.

Self-Distillation: Towards Efficient and Compact Neural Networks.

IEEE Trans Pattern Anal Mach Intell. 2022 Aug;44(8):4388-4403. doi: 10.1109/TPAMI.2021.3067100. Epub 2022 Jul 1.

Catheter segmentation in X-ray fluoroscopy using synthetic data and transfer learning with light U-nets.

Comput Methods Programs Biomed. 2020 Aug;192:105420. doi: 10.1016/j.cmpb.2020.105420. Epub 2020 Feb 29.

Distilling a Powerful Student Model via Online Knowledge Distillation.

IEEE Trans Neural Netw Learn Syst. 2023 Nov;34(11):8743-8752. doi: 10.1109/TNNLS.2022.3152732. Epub 2023 Oct 27.

Multi-view Teacher-Student Network.

Neural Netw. 2022 Feb;146:69-84. doi: 10.1016/j.neunet.2021.11.002. Epub 2021 Nov 15.

Evolutionary Shallowing Deep Neural Networks at Block Levels.

IEEE Trans Neural Netw Learn Syst. 2022 Sep;33(9):4635-4647. doi: 10.1109/TNNLS.2021.3059529. Epub 2022 Aug 31.

Multistructure-Based Collaborative Online Distillation.

Entropy (Basel). 2019 Apr 2;21(4):357. doi: 10.3390/e21040357.

Deep convolutional neural network and IoT technology for healthcare.

Digit Health. 2024 Jan 17;10:20552076231220123. doi: 10.1177/20552076231220123. eCollection 2024 Jan-Dec.

Unsupervised Monocular Depth Estimation via Recursive Stereo Distillation.

IEEE Trans Image Process. 2021;30:4492-4504. doi: 10.1109/TIP.2021.3072215. Epub 2021 Apr 27.

本文引用的文献

Multi-granularity knowledge distillation and prototype consistency regularization for class-incremental learning.

Neural Netw. 2023 Jul;164:617-630. doi: 10.1016/j.neunet.2023.05.006. Epub 2023 May 11.

Lifelong learning with Shared and Private Latent Representations learned through synaptic intelligence.

Neural Netw. 2023 Jun;163:165-177. doi: 10.1016/j.neunet.2023.04.005. Epub 2023 Apr 11.

3D network with channel excitation and knowledge distillation for action recognition.

Front Neurorobot. 2023 Mar 23;17:1050167. doi: 10.3389/fnbot.2023.1050167. eCollection 2023.

Physical-model guided self-distillation network for single image dehazing.

Front Neurorobot. 2022 Dec 1;16:1036465. doi: 10.3389/fnbot.2022.1036465. eCollection 2022.

Compressing speaker extraction model with ultra-low precision quantization and knowledge distillation.

Neural Netw. 2022 Oct;154:13-21. doi: 10.1016/j.neunet.2022.06.026. Epub 2022 Jun 27.

Learning Hybrid Image Templates (HIT) by Information Projection.

IEEE Trans Pattern Anal Mach Intell. 2012 Jul;34(7):1354-67. doi: 10.1109/TPAMI.2011.227. Epub 2011 Dec 6.

Recurrent excitation in neocortical circuits.

Science. 1995 Aug 18;269(5226):981-5. doi: 10.1126/science.7638624.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

DILS：深度增量学习策略。

DILS: depth incremental learning strategy.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献