Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA.
IBM Thomas J. Watson Research Center, Yorktown Heights, NY, 10598, USA.
Neural Netw. 2020 Apr;124:383-392. doi: 10.1016/j.neunet.2020.01.007. Epub 2020 Jan 18.
Recently, deep learning has achieved huge successes in many important applications. In our previous studies, we proposed quadratic/second-order neurons and deep quadratic neural networks. In a quadratic neuron, the inner product of a vector of data and the corresponding weights in a conventional neuron is replaced with a quadratic function. The resultant quadratic neuron enjoys an enhanced expressive capability over the conventional neuron. However, how quadratic neurons improve the expressing capability of a deep quadratic network has not been studied up to now, preferably in relation to that of a conventional neural network. Specifically, we ask four basic questions in this paper: (1) for the one-hidden-layer network structure, is there any function that a quadratic network can approximate much more efficiently than a conventional network? (2) for the same multi-layer network structure, is there any function that can be expressed by a quadratic network but cannot be expressed with conventional neurons in the same structure? (3) Does a quadratic network give a new insight into universal approximation? (4) To approximate the same class of functions with the same error bound, could a quantized quadratic network have a lower number of weights than a quantized conventional network? Our main contributions are the four interconnected theorems shedding light upon these four questions and demonstrating the merits of a quadratic network in terms of expressive efficiency, unique capability, compact architecture and computational capacity respectively.
最近,深度学习在许多重要应用中取得了巨大成功。在我们之前的研究中,我们提出了二次/二阶神经元和深度二次神经网络。在二次神经元中,数据向量与传统神经元中相应权重的内积被替换为二次函数。与传统神经元相比,由此产生的二次神经元具有更强的表达能力。然而,到目前为止,还没有研究二次神经元如何提高深度二次网络的表达能力,最好与传统神经网络的表达能力相关。具体来说,本文提出了四个基本问题:(1)对于单隐藏层网络结构,是否存在任何函数,二次网络可以比传统网络更有效地逼近?(2)对于相同的多层网络结构,是否存在任何函数可以用二次网络表示,但在相同结构中不能用传统神经元表示?(3)二次网络是否为通用逼近提供了新的视角?(4)为了用相同的误差界逼近相同的函数类,量化的二次网络的权重是否可以比量化的传统网络少?我们的主要贡献是四个相互关联的定理,这些定理分别阐明了这些问题,并从表达效率、独特能力、紧凑的架构和计算能力等方面展示了二次网络的优势。