Suppr超能文献

人工神经网络的分段凸性。

Piecewise convexity of artificial neural networks.

机构信息

Stanford University, Department of Electrical Engineering, 1201 Welch Rd, Stanford, CA, 94305, USA.

Stanford University, Department of Radiology (Biomedical Informatics Research), 1201 Welch Rd Stanford, CA, 94305, USA.

出版信息

Neural Netw. 2017 Oct;94:34-45. doi: 10.1016/j.neunet.2017.06.009. Epub 2017 Jul 3.

Abstract

Although artificial neural networks have shown great promise in applications including computer vision and speech recognition, there remains considerable practical and theoretical difficulty in optimizing their parameters. The seemingly unreasonable success of gradient descent methods in minimizing these non-convex functions remains poorly understood. In this work we offer some theoretical guarantees for networks with piecewise affine activation functions, which have in recent years become the norm. We prove three main results. First, that the network is piecewise convex as a function of the input data. Second, that the network, considered as a function of the parameters in a single layer, all others held constant, is again piecewise convex. Third, that the network as a function of all its parameters is piecewise multi-convex, a generalization of biconvexity. From here we characterize the local minima and stationary points of the training objective, showing that they minimize the objective on certain subsets of the parameter space. We then analyze the performance of two optimization algorithms on multi-convex problems: gradient descent, and a method which repeatedly solves a number of convex sub-problems. We prove necessary convergence conditions for the first algorithm and both necessary and sufficient conditions for the second, after introducing regularization to the objective. Finally, we remark on the remaining difficulty of the global optimization problem. Under the squared error objective, we show that by varying the training data, a single rectifier neuron admits local minima arbitrarily far apart, both in objective value and parameter space.

摘要

尽管人工神经网络在计算机视觉和语音识别等应用中表现出了巨大的潜力,但在优化其参数方面仍然存在相当大的实际和理论困难。梯度下降方法在最小化这些非凸函数方面的看似不合理的成功仍然理解得很差。在这项工作中,我们为具有分段仿射激活函数的网络提供了一些理论保证,这些网络近年来已经成为规范。我们证明了三个主要结果。首先,网络作为输入数据的函数是分段凸的。其次,网络作为单个层中参数的函数,所有其他参数保持不变,再次是分段凸的。第三,网络作为其所有参数的函数是分段多凸的,这是双凸性的推广。从这里我们可以描述训练目标的局部极小值和稳定点,表明它们在参数空间的某些子集上最小化目标。然后,我们分析了两种优化算法在多凸问题上的性能:梯度下降和一种反复解决多个凸子问题的方法。我们在目标中引入正则化后,为第一个算法证明了必要的收敛条件,并为第二个算法证明了必要和充分的条件。最后,我们提到了全局优化问题仍然存在的困难。在均方误差目标下,我们表明通过改变训练数据,单个整流神经元在目标值和参数空间中都可以接受任意远的局部极小值。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验