Suppr超能文献

贝叶斯神经网络中的层自适应节点选择:统计保证与实现细节。

Layer adaptive node selection in Bayesian neural networks: Statistical guarantees and implementation details.

作者信息

Jantre Sanket, Bhattacharya Shrijita, Maiti Tapabrata

机构信息

Department of Statistics and Probability, Michigan State University, United States of America.

出版信息

Neural Netw. 2023 Oct;167:309-330. doi: 10.1016/j.neunet.2023.08.029. Epub 2023 Aug 22.

Abstract

Sparse deep neural networks have proven to be efficient for predictive model building in large-scale studies. Although several works have studied theoretical and numerical properties of sparse neural architectures, they have primarily focused on the edge selection. Sparsity through edge selection might be intuitively appealing; however, it does not necessarily reduce the structural complexity of a network. Instead pruning excessive nodes leads to a structurally sparse network with significant computational speedup during inference. To this end, we propose a Bayesian sparse solution using spike-and-slab Gaussian priors to allow for automatic node selection during training. The use of spike-and-slab prior alleviates the need of an ad-hoc thresholding rule for pruning. In addition, we adopt a variational Bayes approach to circumvent the computational challenges of traditional Markov Chain Monte Carlo (MCMC) implementation. In the context of node selection, we establish the fundamental result of variational posterior consistency together with the characterization of prior parameters. In contrast to the previous works, our theoretical development relaxes the assumptions of the equal number of nodes and uniform bounds on all network weights, thereby accommodating sparse networks with layer-dependent node structures or coefficient bounds. With a layer-wise characterization of prior inclusion probabilities, we discuss the optimal contraction rates of the variational posterior. We empirically demonstrate that our proposed approach outperforms the edge selection method in computational complexity with similar or better predictive performance. Our experimental evidence further substantiates that our theoretical work facilitates layer-wise optimal node recovery.

摘要

稀疏深度神经网络已被证明在大规模研究中的预测模型构建方面是有效的。尽管有几项工作研究了稀疏神经架构的理论和数值特性,但它们主要集中在边的选择上。通过边选择实现的稀疏性可能直观上很有吸引力;然而,它不一定会降低网络的结构复杂性。相反,修剪过多的节点会导致一个结构稀疏的网络,在推理过程中显著加快计算速度。为此,我们提出了一种使用尖峰和平板高斯先验的贝叶斯稀疏解决方案,以便在训练期间进行自动节点选择。尖峰和平板先验的使用减轻了对用于修剪的临时阈值规则的需求。此外,我们采用变分贝叶斯方法来规避传统马尔可夫链蒙特卡罗(MCMC)实现的计算挑战。在节点选择的背景下,我们建立了变分后验一致性的基本结果以及先验参数的表征。与先前的工作相比,我们的理论发展放宽了节点数量相等和所有网络权重均匀有界的假设,从而适应具有层依赖节点结构或系数边界的稀疏网络。通过对先验包含概率的逐层表征,我们讨论了变分后验的最优收缩率。我们通过实验证明,我们提出的方法在计算复杂度方面优于边选择方法,同时具有相似或更好的预测性能。我们的实验证据进一步证实了我们的理论工作有助于逐层最优节点恢复。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验