Kim Yongdai, Ohn Ilsang, Kim Dongha
Department of Statistics and Department of Data Science, Seoul National University, Seoul 08826, Republic of Korea.
Department of Applied and Computational Mathematics and Statistics, The University of Notre Dame, Indiana 46530, USA.
Neural Netw. 2021 Jun;138:179-197. doi: 10.1016/j.neunet.2021.02.012. Epub 2021 Feb 23.
We derive the fast convergence rates of a deep neural network (DNN) classifier with the rectified linear unit (ReLU) activation function learned using the hinge loss. We consider three cases for a true model: (1) a smooth decision boundary, (2) smooth conditional class probability, and (3) the margin condition (i.e., the probability of inputs near the decision boundary is small). We show that the DNN classifier learned using the hinge loss achieves fast rate convergences for all three cases provided that the architecture (i.e., the number of layers, number of nodes and sparsity) is carefully selected. An important implication is that DNN architectures are very flexible for use in various cases without much modification. In addition, we consider a DNN classifier learned by minimizing the cross-entropy, and show that the DNN classifier achieves a fast convergence rate under the conditions that the noise exponent and margin exponent are large. Even though they are strong, we explain that these two conditions are not too absurd for image classification problems. To confirm our theoretical explanation, we present the results of a small numerical study conducted to compare the hinge loss and cross-entropy.
我们推导了使用铰链损失学习的具有整流线性单元(ReLU)激活函数的深度神经网络(DNN)分类器的快速收敛速率。对于真实模型,我们考虑三种情况:(1)平滑决策边界,(2)平滑条件类概率,以及(3)边缘条件(即决策边界附近输入的概率较小)。我们表明,只要精心选择架构(即层数、节点数和稀疏性),使用铰链损失学习的DNN分类器在所有这三种情况下都能实现快速收敛速率。一个重要的含义是,DNN架构在各种情况下使用时非常灵活,无需太多修改。此外,我们考虑通过最小化交叉熵学习的DNN分类器,并表明在噪声指数和边缘指数较大的条件下,DNN分类器实现了快速收敛速率。尽管这些条件很强,但我们解释说,对于图像分类问题,这两个条件并非过于荒谬。为了证实我们的理论解释,我们展示了一项小型数值研究的结果,该研究旨在比较铰链损失和交叉熵。