Suppr超能文献

深度任意多项式混沌神经网络或深度人工神经网络如何从数据驱动的均匀混沌理论中受益。

The deep arbitrary polynomial chaos neural network or how Deep Artificial Neural Networks could benefit from data-driven homogeneous chaos theory.

机构信息

Department of Stochastic Simulation and Safety Research for Hydrosystems, Institute for Modelling Hydraulic and Environmental Systems, Stuttgart Center for Simulation Science, University of Stuttgart, Pfaffenwaldring 5a, 70569 Stuttgart, Germany.

Department of Stochastic Simulation and Safety Research for Hydrosystems, Institute for Modelling Hydraulic and Environmental Systems, Stuttgart Center for Simulation Science, University of Stuttgart, Pfaffenwaldring 5a, 70569 Stuttgart, Germany.

出版信息

Neural Netw. 2023 Sep;166:85-104. doi: 10.1016/j.neunet.2023.06.036. Epub 2023 Jul 10.

Abstract

Artificial Intelligence and Machine learning have been widely used in various fields of mathematical computing, physical modeling, computational science, communication science, and stochastic analysis. Approaches based on Deep Artificial Neural Networks (DANN) are very popular in our days. Depending on the learning task, the exact form of DANNs is determined via their multi-layer architecture, activation functions and the so-called loss function. However, for a majority of deep learning approaches based on DANNs, the kernel structure of neural signal processing remains the same, where the node response is encoded as a linear superposition of neural activity, while the non-linearity is triggered by the activation functions. In the current paper, we suggest to analyze the neural signal processing in DANNs from the point of view of homogeneous chaos theory as known from polynomial chaos expansion (PCE). From the PCE perspective, the (linear) response on each node of a DANN could be seen as a 1st degree multi-variate polynomial of single neurons from the previous layer, i.e. linear weighted sum of monomials. From this point of view, the conventional DANN structure relies implicitly (but erroneously) on a Gaussian distribution of neural signals. Additionally, this view revels that by design DANNs do not necessarily fulfill any orthogonality or orthonormality condition for a majority of data-driven applications. Therefore, the prevailing handling of neural signals in DANNs could lead to redundant representation as any neural signal could contain some partial information from other neural signals. To tackle that challenge, we suggest to employ the data-driven generalization of PCE theory known as arbitrary polynomial chaos (aPC) to construct a corresponding multi-variate orthonormal representations on each node of a DANN. Doing so, we generalize the conventional structure of DANNs to Deep arbitrary polynomial chaos neural networks (DaPC NN). They decompose the neural signals that travel through the multi-layer structure by an adaptive construction of data-driven multi-variate orthonormal bases for each layer. Moreover, the introduced DaPC NN provides an opportunity to go beyond the linear weighted superposition of single neurons on each node. Inheriting fundamentals of PCE theory, the DaPC NN offers an additional possibility to account for high-order neural effects reflecting simultaneous interaction in multi-layer networks. Introducing the high-order weighted superposition on each node of the network mitigates the necessity to introduce non-linearity via activation functions and, hence, reduces the room for potential subjectivity in the modeling procedure. Although the current DaPC NN framework has no theoretical restrictions on the use of activation functions. The current paper also summarizes relevant properties of DaPC NNs inherited from aPC as analytical expressions for statistical quantities and sensitivity indexes on each node. We also offer an analytical form of partial derivatives that could be used in various training algorithms. Technically, DaPC NNs require similar training procedures as conventional DANNs, and all trained weights determine automatically the corresponding multi-variate data-driven orthonormal bases for all layers of DaPC NN. The paper makes use of three test cases to illustrate the performance of DaPC NN, comparing it with the performance of the conventional DANN and also with plain aPC expansion. Evidence of convergence over the training data size against validation data sets demonstrates that the DaPC NN outperforms the conventional DANN systematically. Overall, the suggested re-formulation of the kernel network structure in terms of homogeneous chaos theory is not limited to any particular architecture or any particular definition of the loss function. The DaPC NN Matlab Toolbox is available online and users are invited to adopt it for own needs.

摘要

人工智能和机器学习已经在数学计算、物理建模、计算科学、通信科学和随机分析等各个领域得到了广泛的应用。基于深度人工神经网络 (DANN) 的方法在当今非常流行。根据学习任务,DANN 的具体形式通过其多层架构、激活函数和所谓的损失函数来确定。然而,对于大多数基于 DANN 的深度学习方法,神经信号处理的核结构保持不变,其中节点响应被编码为神经活动的线性叠加,而非线性由激活函数触发。在当前的论文中,我们建议从多项式混沌展开 (PCE) 中已知的均匀混沌理论的角度来分析 DANN 中的神经信号处理。从 PCE 的角度来看,DANN 中每个节点的 (线性) 响应可以看作是前一层单个神经元的 1 阶多元多项式,即单一项的线性加权和。从这个角度来看,传统的 DANN 结构隐含地(但错误地)依赖于神经信号的高斯分布。此外,这种观点揭示了,由于设计原因,对于大多数数据驱动的应用,DANN 不一定满足任何正交性或正交归一性条件。因此,在 DANN 中处理神经信号的流行方法可能会导致冗余表示,因为任何神经信号都可能包含来自其他神经信号的一些部分信息。为了应对这一挑战,我们建议采用称为任意多项式混沌 (aPC) 的 PCE 理论的数据驱动推广来为 DANN 的每个节点构建相应的多元正交表示。通过这样做,我们将传统的 DANN 结构推广到深度任意多项式混沌神经网络 (DaPC NN)。它们通过为每个层自适应构建数据驱动的多元正交基来分解通过多层结构传输的神经信号。此外,引入的 DaPC NN 提供了超越每个节点上单个神经元的线性加权叠加的机会。继承了 PCE 理论的基础,DaPC NN 为同时考虑多层网络中的高阶神经效应提供了额外的可能性,这些神经效应反映了同时交互。在网络每个节点的高阶加权叠加,减少了通过激活函数引入非线性的必要性,从而减少了建模过程中潜在主观性的空间。尽管当前的 DaPC NN 框架在激活函数的使用上没有理论限制。本文还总结了从 aPC 继承的 DaPC NN 的相关属性,例如每个节点的统计量和灵敏度指标的解析表达式。我们还提供了可用于各种训练算法的偏导数的解析形式。从技术上讲,DaPC NN 需要与传统 DANN 类似的训练程序,并且所有训练的权重都会自动确定 DaPC NN 所有层的相应多元数据驱动正交基。本文使用了三个测试案例来说明 DaPC NN 的性能,将其与传统 DANN 的性能进行了比较,还与普通的 aPC 扩展进行了比较。对训练数据大小和验证数据集的收敛性的证据表明,DaPC NN 系统地优于传统 DANN。总的来说,基于均匀混沌理论对核网络结构的重新表述不仅限于任何特定的架构或任何特定的损失函数定义。DaPC NN 的 Matlab 工具箱可在线使用,欢迎用户根据自己的需要采用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验