School of Data Science, City University of Hong Kong Kowloon, Hong Kong.
School of Data Science, City University of Hong Kong Kowloon, Hong Kong.
Neural Netw. 2021 Dec;144:778-790. doi: 10.1016/j.neunet.2021.09.027. Epub 2021 Oct 6.
We consider a family of deep neural networks consisting of two groups of convolutional layers, a downsampling operator, and a fully connected layer. The network structure depends on two structural parameters which determine the numbers of convolutional layers and the width of the fully connected layer. We establish an approximation theory with explicit approximation rates when the approximated function takes a composite form f∘Q with a feature polynomial Q and a univariate function f. In particular, we prove that such a network can outperform fully connected shallow networks in approximating radial functions with Q(x)=|x|, when the dimension d of data from R is large. This gives the first rigorous proof for the superiority of deep convolutional neural networks in approximating functions with special structures. Then we carry out generalization analysis for empirical risk minimization with such a deep network in a regression framework with the regression function of the form f∘Q. Our network structure which does not use any composite information or the functions Q and f can automatically extract features and make use of the composite nature of the regression function via tuning the structural parameters. Our analysis provides an error bound which decreases with the network depth to a minimum and then increases, verifying theoretically a trade-off phenomenon observed for network depths in many practical applications.
我们考虑了一个由两个卷积层组、一个下采样算子和一个全连接层组成的深度神经网络家族。该网络结构取决于两个结构参数,它们决定了卷积层的数量和全连接层的宽度。当逼近函数为具有特征多项式 Q 和单变量函数 f 的复合形式 f∘Q 时,我们建立了一个具有显式逼近速率的逼近理论。特别是,我们证明了在逼近具有 Q(x)=|x|的径向函数时,当数据的维数 d 很大时,这样的网络可以优于完全连接的浅层网络。这为深度卷积神经网络在逼近具有特殊结构的函数方面的优越性提供了第一个严格的证明。然后,我们在回归框架中对具有形式 f∘Q 的回归函数的经验风险最小化进行了推广分析。我们的网络结构不使用任何复合信息或函数 Q 和 f,而是通过调整结构参数,自动提取特征并利用回归函数的复合性质。我们的分析提供了一个误差界,该误差界随着网络深度的增加先减小到最小值,然后再增加,从理论上验证了在许多实际应用中观察到的网络深度的权衡现象。