Sun Yan, Song Qifan, Liang Faming
Department of Statistics, Purdue University, West Lafayette, IN 47907.
J Am Stat Assoc. 2022;117(540):1981-1995. doi: 10.1080/01621459.2021.1895175. Epub 2021 Apr 20.
Deep learning has been the engine powering many successes of data science. However, the deep neural network (DNN), as the basic model of deep learning, is often excessively over-parameterized, causing many difficulties in training, prediction and interpretation. We propose a frequentist-like method for learning sparse DNNs and justify its consistency under the Bayesian framework: the proposed method could learn a sparse DNN with at most (/log()) connections and nice theoretical guarantees such as posterior consistency, variable selection consistency and asymptotically optimal generalization bounds. In particular, we establish posterior consistency for the sparse DNN with a mixture Gaussian prior, show that the structure of the sparse DNN can be consistently determined using a Laplace approximation-based marginal posterior inclusion probability approach, and use Bayesian evidence to elicit sparse DNNs learned by an optimization method such as stochastic gradient descent in multiple runs with different initializations. The proposed method is computationally more efficient than standard Bayesian methods for large-scale sparse DNNs. The numerical results indicate that the proposed method can perform very well for large-scale network compression and high-dimensional nonlinear variable selection, both advancing interpretable machine learning.
深度学习一直是推动数据科学诸多成功的引擎。然而,深度神经网络(DNN)作为深度学习的基本模型,往往参数过度冗余,在训练、预测和解释方面带来诸多困难。我们提出一种类似频率学派的方法来学习稀疏DNN,并在贝叶斯框架下证明其一致性:该方法能够学习最多具有(/log())连接的稀疏DNN,并具有诸如后验一致性、变量选择一致性和渐近最优泛化界等良好的理论保证。具体而言,我们为具有混合高斯先验的稀疏DNN建立了后验一致性,表明可以使用基于拉普拉斯近似的边际后验包含概率方法一致地确定稀疏DNN的结构,并使用贝叶斯证据来引出通过诸如随机梯度下降等优化方法在多次不同初始化运行中学习到的稀疏DNN。对于大规模稀疏DNN,该方法在计算上比标准贝叶斯方法更高效。数值结果表明,该方法在大规模网络压缩和高维非线性变量选择方面表现出色,推动了可解释机器学习的发展。