Suppr超能文献

具有泛化保证的卷积神经网络训练的改进线性收敛性:单隐藏层情况

Improved Linear Convergence of Training CNNs With Generalizability Guarantees: A One-Hidden-Layer Case.

作者信息

Zhang Shuai, Wang Meng, Xiong Jinjun, Liu Sijia, Chen Pin-Yu

出版信息

IEEE Trans Neural Netw Learn Syst. 2021 Jun;32(6):2622-2635. doi: 10.1109/TNNLS.2020.3007399. Epub 2021 Jun 2.

Abstract

We analyze the learning problem of one-hidden-layer nonoverlapping convolutional neural networks with the rectified linear unit (ReLU) activation function from the perspective of model estimation. The training outputs are assumed to be generated by the neural network with the unknown ground-truth parameters plus some additive noise, and the objective is to estimate the model parameters by minimizing a nonconvex squared loss function of the training data. Assuming that the training set contains a finite number of samples generated from the Gaussian distribution, we prove that the accelerated gradient descent (GD) algorithm with a proper initialization converges to the ground-truth parameters (up to the noise level) with a linear rate even though the learning problem is nonconvex. Moreover, the convergence rate is proved to be faster than the vanilla GD. The initialization can be achieved by the existing tensor initialization method. In contrast to the existing works that assume an infinite number of samples, we theoretically establish the sample complexity of the required number of training samples. Although the neural network considered here is not deep, this is the first work to show that accelerated GD algorithms can find the global optimizer of the nonconvex learning problem of neural networks. This is also the first work that characterizes the sample complexity of gradient-based methods in learning convolutional neural networks with the nonsmooth ReLU activation function. This work also provides the tightest bound so far of the estimation error with respect to the output noise.

摘要

我们从模型估计的角度分析了具有整流线性单元(ReLU)激活函数的单隐藏层非重叠卷积神经网络的学习问题。假设训练输出是由具有未知真实参数的神经网络加上一些加性噪声生成的,目标是通过最小化训练数据的非凸平方损失函数来估计模型参数。假设训练集包含从高斯分布生成的有限数量的样本,我们证明,即使学习问题是非凸的,具有适当初始化的加速梯度下降(GD)算法也能以线性速率收敛到真实参数(达到噪声水平)。此外,证明收敛速度比普通GD更快。初始化可以通过现有的张量初始化方法实现。与假设样本数量无限的现有工作不同,我们从理论上确定了所需训练样本数量的样本复杂度。虽然这里考虑的神经网络不深,但这是第一项表明加速GD算法可以找到神经网络非凸学习问题的全局优化器的工作。这也是第一项刻画具有非光滑ReLU激活函数的卷积神经网络学习中基于梯度的方法的样本复杂度的工作。这项工作还提供了迄今为止关于输出噪声的估计误差的最紧界。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验