辍学学习算法

The Dropout Learning Algorithm.

作者信息

Baldi Pierre, Sadowski Peter

机构信息

Department of Computer Science University of California, Irvine Irvine, CA 92697-3435.

出版信息

Artif Intell. 2014 May;210:78-122. doi: 10.1016/j.artint.2014.02.004.

DOI:10.1016/j.artint.2014.02.004

PMID:24771879

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3996711/

Abstract

Dropout is a recently introduced algorithm for training neural network by randomly dropping units during training to prevent their co-adaptation. A mathematical analysis of some of the static and dynamic properties of dropout is provided using Bernoulli gating variables, general enough to accommodate dropout on units or connections, and with variable rates. The framework allows a complete analysis of the ensemble averaging properties of dropout in linear networks, which is useful to understand the non-linear case. The ensemble averaging properties of dropout in non-linear logistic networks result from three fundamental equations: (1) the approximation of the expectations of logistic functions by normalized geometric means, for which bounds and estimates are derived; (2) the algebraic equality between normalized geometric means of logistic functions with the logistic of the means, which mathematically characterizes logistic functions; and (3) the linearity of the means with respect to sums, as well as products of independent variables. The results are also extended to other classes of transfer functions, including rectified linear functions. Approximation errors tend to cancel each other and do not accumulate. Dropout can also be connected to stochastic neurons and used to predict firing rates, and to backpropagation by viewing the backward propagation as ensemble averaging in a dropout linear network. Moreover, the convergence properties of dropout can be understood in terms of stochastic gradient descent. Finally, for the regularization properties of dropout, the expectation of the dropout gradient is the gradient of the corresponding approximation ensemble, regularized by an adaptive weight decay term with a propensity for self-consistent variance minimization and sparse representations.

摘要

随机失活（Dropout）是一种最近引入的用于训练神经网络的算法，它在训练过程中随机丢弃单元以防止它们共同适应。利用伯努利门控变量对随机失活的一些静态和动态特性进行了数学分析，该变量具有足够的通用性，能够适应单元或连接上的随机失活，且失活率可变。该框架允许对线性网络中随机失活的集成平均特性进行完整分析，这有助于理解非线性情况。非线性逻辑网络中随机失活的集成平均特性源于三个基本方程：（1）通过归一化几何均值对逻辑函数期望的近似，为此推导了边界和估计；（2）逻辑函数的归一化几何均值与均值的逻辑之间的代数等式，它从数学上刻画了逻辑函数；（3）均值相对于和以及自变量乘积的线性关系。结果还扩展到了其他类别的传递函数，包括整流线性函数。近似误差往往相互抵消，不会累积。随机失活还可以与随机神经元相关联，并用于预测 firing 率，以及通过将反向传播视为随机失活线性网络中的集成平均来与反向传播相关联。此外，随机失活的收敛特性可以用随机梯度下降来理解。最后，对于随机失活的正则化特性，随机失活梯度的期望是相应近似集成的梯度，通过具有自洽方差最小化和稀疏表示倾向的自适应权重衰减项进行正则化。

相似文献

The Dropout Learning Algorithm.

Artif Intell. 2014 May;210:78-122. doi: 10.1016/j.artint.2014.02.004.

Implicit Regularization of Dropout.

IEEE Trans Pattern Anal Mach Intell. 2024 Jun;46(6):4206-4217. doi: 10.1109/TPAMI.2024.3357172. Epub 2024 May 7.

Actor-Critic Learning Control With Regularization and Feature Selection in Policy Gradient Estimation.

IEEE Trans Neural Netw Learn Syst. 2021 Mar;32(3):1217-1227. doi: 10.1109/TNNLS.2020.2981377. Epub 2021 Mar 1.

Less Is More: Adaptive Trainable Gradient Dropout for Deep Neural Networks.

Sensors (Basel). 2023 Jan 24;23(3):1325. doi: 10.3390/s23031325.

Continuous Dropout.

IEEE Trans Neural Netw Learn Syst. 2018 Sep;29(9):3926-3937. doi: 10.1109/TNNLS.2017.2750679. Epub 2017 Oct 3.

The Stochastic Delta Rule: Faster and More Accurate Deep Learning Through Adaptive Weight Noise.

Neural Comput. 2020 May;32(5):1018-1032. doi: 10.1162/neco_a_01276. Epub 2020 Mar 18.

Biased Dropout and Crossmap Dropout: Learning towards effective Dropout regularization in convolutional neural network.

Neural Netw. 2018 Aug;104:60-67. doi: 10.1016/j.neunet.2018.03.016. Epub 2018 Apr 9.

Towards dropout training for convolutional neural networks.

Neural Netw. 2015 Nov;71:1-10. doi: 10.1016/j.neunet.2015.07.007. Epub 2015 Jul 29.

Convergence of batch gradient learning with smoothing regularization and adaptive momentum for neural networks.

Springerplus. 2016 Mar 8;5:295. doi: 10.1186/s40064-016-1931-0. eCollection 2016.

Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds.

Research (Wash D C). 2023 Mar 8;6:0024. doi: 10.34133/research.0024. eCollection 2023.

引用本文的文献

A network-based approach to discover diagnostic metabolite markers associated with depressive features for major depressive disorder.

Front Psychiatry. 2025 Jun 6;16:1610520. doi: 10.3389/fpsyt.2025.1610520. eCollection 2025.

Forensic dental age estimation with deep learning: a modified xception model for panoramic X-Ray images.

Forensic Sci Med Pathol. 2025 Jun;21(2):565-579. doi: 10.1007/s12024-025-00962-4. Epub 2025 Feb 12.

ANN uncertainty estimates in assessing fatty liver content from ultrasound data.

Comput Struct Biotechnol J. 2024 Oct 1;24:603-610. doi: 10.1016/j.csbj.2024.09.021. eCollection 2024 Dec.

A deep learning-based system to identify originating mural layer of upper gastrointestinal submucosal tumors under EUS.

Endosc Ultrasound. 2023 Nov-Dec;12(6):465-471. doi: 10.1097/eus.0000000000000029. Epub 2023 Dec 22.

A deep learning method to predict bacterial ADP-ribosyltransferase toxins.

Bioinformatics. 2024 Jul 1;40(7). doi: 10.1093/bioinformatics/btae378.

A deep learning-based system for mediastinum station localization in linear EUS (with video).

Endosc Ultrasound. 2023 Sep-Oct;12(5):417-423. doi: 10.1097/eus.0000000000000011. Epub 2023 Oct 16.

LSTM4piRNA: Efficient piRNA Detection in Large-Scale Genome Databases Using a Deep Learning-Based LSTM Network.

Int J Mol Sci. 2023 Oct 27;24(21):15681. doi: 10.3390/ijms242115681.

Automated deep learning-based AMD detection and staging in real-world OCT datasets (PINNACLE study report 5).

Sci Rep. 2023 Nov 9;13(1):19545. doi: 10.1038/s41598-023-46626-7.

A numerical control machining tool path step error prediction method based on BP neural network.

Sci Rep. 2023 Sep 28;13(1):16347. doi: 10.1038/s41598-023-43617-6.

A Federated Learning Approach to Tumor Detection in Colon Histology Images.

J Med Syst. 2023 Sep 16;47(1):99. doi: 10.1007/s10916-023-01994-5.

本文引用的文献

Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training.

IEEE Trans Neural Netw. 1994;5(5):792-802. doi: 10.1109/72.317730.

Learning in linear neural networks: a survey.

IEEE Trans Neural Netw. 1995;6(4):837-58. doi: 10.1109/72.392248.

Interaural time and intensity coding in superior olivary complex and inferior colliculus of the echolocating bat Molossus ater.

J Neurophysiol. 1985 Jan;53(1):89-109. doi: 10.1152/jn.1985.53.1.89.

Axonal delay lines for time measurement in the owl's brainstem.

Proc Natl Acad Sci U S A. 1988 Nov;85(21):8311-5. doi: 10.1073/pnas.85.21.8311.

A circuit for detection of interaural time differences in the brain stem of the barn owl.

J Neurosci. 1990 Oct;10(10):3227-46. doi: 10.1523/JNEUROSCI.10-10-03227.1990.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

辍学学习算法

The Dropout Learning Algorithm.

作者信息

Baldi Pierre, Sadowski Peter

机构信息

Department of Computer Science University of California, Irvine Irvine, CA 92697-3435.

出版信息

Artif Intell. 2014 May;210:78-122. doi: 10.1016/j.artint.2014.02.004.

DOI:10.1016/j.artint.2014.02.004

PMID:24771879

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3996711/

Abstract

摘要

辍学学习算法

The Dropout Learning Algorithm.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

辍学学习算法

The Dropout Learning Algorithm.

作者信息

机构信息

出版信息