Laboratory for Information & Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 02142.
Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA 02142.
Proc Natl Acad Sci U S A. 2023 Apr 4;120(14):e2208779120. doi: 10.1073/pnas.2208779120. Epub 2023 Mar 30.
While neural networks are used for classification tasks across domains, a long-standing open problem in machine learning is determining whether neural networks trained using standard procedures are consistent for classification, i.e., whether such models minimize the probability of misclassification for arbitrary data distributions. In this work, we identify and construct an explicit set of neural network classifiers that are consistent. Since effective neural networks in practice are typically both wide and deep, we analyze infinitely wide networks that are also infinitely deep. In particular, using the recent connection between infinitely wide neural networks and neural tangent kernels, we provide explicit activation functions that can be used to construct networks that achieve consistency. Interestingly, these activation functions are simple and easy to implement, yet differ from commonly used activations such as ReLU or sigmoid. More generally, we create a taxonomy of infinitely wide and deep networks and show that these models implement one of three well-known classifiers depending on the activation function used: 1) 1-nearest neighbor (model predictions are given by the label of the nearest training example); 2) majority vote (model predictions are given by the label of the class with the greatest representation in the training set); or 3) singular kernel classifiers (a set of classifiers containing those that achieve consistency). Our results highlight the benefit of using deep networks for classification tasks, in contrast to regression tasks, where excessive depth is harmful.
虽然神经网络被用于各个领域的分类任务,但机器学习中的一个长期存在的开放性问题是确定使用标准程序训练的神经网络是否在分类上是一致的,即这些模型是否最小化了任意数据分布的分类错误概率。在这项工作中,我们确定并构建了一组明确的一致的神经网络分类器。由于实际中有效的神经网络通常既宽又深,我们分析了无限宽且无限深的网络。具体来说,利用最近无限宽神经网络与神经切核之间的联系,我们提供了可以用于构建实现一致性的网络的显式激活函数。有趣的是,这些激活函数简单易用,但与常用的激活函数(如 ReLU 或 sigmoid)不同。更一般地,我们创建了无限宽和深的网络分类法,并表明这些模型根据使用的激活函数实现了三种著名分类器中的一种:1)最近邻(模型预测由最近的训练示例的标签给出);2)多数投票(模型预测由训练集中代表性最大的类别的标签给出);或 3)奇异核分类器(包含实现一致性的那些分类器的集合)。我们的结果强调了在分类任务中使用深度网络的好处,与回归任务形成对比,在回归任务中,深度过大是有害的。