关于使用连续激活函数训练神经网络的复杂性

On the complexity of training neural networks with continuous activation functions.

作者信息

Dasgupta B, Siegelmann H T, Sontag E

机构信息

Dept. of Comput. Sci., Minnesota Univ., Minneapolis, MN.

出版信息

IEEE Trans Neural Netw. 1995;6(6):1490-504. doi: 10.1109/72.471360.

DOI:10.1109/72.471360

PMID:18263442

Abstract

Deals with computational issues of loading a fixed-architecture neural network with a set of positive and negative examples. This is the first result on the hardness of loading a simple three-node architecture which does not consist of the binary-threshold neurons, but rather utilizes a particular continuous activation function, commonly used in the neural-network literature. The authors observe that the loading problem is polynomial-time if the input dimension is constant. Otherwise, however, any possible learning algorithm based on particular fixed architectures faces severe computational barriers. Similar theorems have already been proved by Megiddo and by Blum and Rivest, to the case of binary-threshold networks only. The authors' theoretical results lend further suggestion to the use of incremental (architecture-changing) techniques for training networks rather than fixed architectures. Furthermore, they imply hardness of learnability in the probably approximately correct sense as well.

摘要

处理使用一组正例和负例加载固定架构神经网络的计算问题。这是关于加载一种简单的三节点架构的难度的首个结果，该架构不包含二元阈值神经元，而是使用神经网络文献中常用的特定连续激活函数。作者观察到，如果输入维度是常数，那么加载问题是多项式时间的。然而，否则的话，任何基于特定固定架构的可能的学习算法都面临严重的计算障碍。Megiddo以及Blum和Rivest已经仅针对二元阈值网络的情况证明了类似定理。作者的理论结果进一步建议使用增量（架构改变）技术来训练网络，而不是固定架构。此外，它们还意味着在可能近似正确的意义上学习的难度。