Kůrková Vera
Institute of Computer Science, Academy of Sciences of the Czech Republic, Prague, CZ 18207.
Neural Comput. 2008 Jan;20(1):252-70. doi: 10.1162/neco.2008.20.1.252.
Supervised learning of perceptron networks is investigated as an optimization problem. It is shown that both the theoretical and the empirical error functionals achieve minima over sets of functions computable by networks with a given number n of perceptrons. Upper bounds on rates of convergence of these minima with n increasing are derived. The bounds depend on a certain regularity of training data expressed in terms of variational norms of functions interpolating the data (in the case of the empirical error) and the regression function (in the case of the expected error). Dependence of this type of regularity on dimensionality and on magnitudes of partial derivatives is investigated. Conditions on the data, which guarantee that a good approximation of global minima of error functionals can be achieved using networks with a limited complexity, are derived. The conditions are in terms of oscillatory behavior of the data measured by the product of a function of the number of variables d, which is decreasing exponentially fast, and the maximum of the magnitudes of the squares of the L(1)-norms of the iterated partial derivatives of the order d of the regression function or some function, which interpolates the sample of the data. The results are illustrated by examples of data with small and high regularity constructed using Boolean functions and the gaussian function.
将感知器网络的监督学习作为一个优化问题进行研究。结果表明,理论误差泛函和经验误差泛函在具有给定数量n个感知器的网络可计算的函数集上都能达到最小值。推导了随着n增加这些最小值的收敛速率的上界。这些上界取决于以插值数据的函数(在经验误差情况下)和回归函数(在期望误差情况下)的变分范数表示的训练数据的某种正则性。研究了这种正则性类型对维度和偏导数大小的依赖性。推导了关于数据的条件,这些条件保证使用具有有限复杂度的网络能够实现误差泛函全局最小值的良好近似。这些条件是根据由变量数量d的函数(其以指数速度快速下降)与回归函数或插值数据样本的某个函数的d阶迭代偏导数的L(1)范数平方的最大值的乘积所衡量的数据振荡行为来表述的。通过使用布尔函数和高斯函数构建的具有低正则性和高正则性的数据示例来说明结果。