Université de Lorraine, LGIPM, Metz, 57000, France.
Université de Lorraine, LGIPM, Metz, 57000, France; Institut Universitaire de France (IUF), Paris, France.
Neural Netw. 2024 Feb;170:149-166. doi: 10.1016/j.neunet.2023.11.032. Epub 2023 Nov 13.
This paper addresses a large class of nonsmooth nonconvex stochastic DC (difference-of-convex functions) programs where endogenous uncertainty is involved and i.i.d. (independent and identically distributed) samples are not available. Instead, we assume that it is only possible to access Markov chains whose sequences of distributions converge to the target distributions. This setting is legitimate as Markovian noise arises in many contexts including Bayesian inference, reinforcement learning, and stochastic optimization in high-dimensional or combinatorial spaces. We then design a stochastic algorithm named Markov chain stochastic DCA (MCSDCA) based on DCA (DC algorithm) - a well-known method for nonconvex optimization. We establish the convergence analysis in both asymptotic and nonasymptotic senses. The MCSDCA is then applied to deep learning via PDEs (partial differential equations) regularization, where two realizations of MCSDCA are constructed, namely MCSDCA-odLD and MCSDCA-udLD, based on overdamped and underdamped Langevin dynamics, respectively. Numerical experiments on time series prediction and image classification problems with a variety of neural network topologies show the merits of the proposed methods.
本文针对一大类非光滑非凸随机 DC(凸差函数)程序,其中涉及内源性不确定性且无法获得独立同分布 (i.i.d.) 样本。相反,我们假设只能访问其分布序列收敛到目标分布的马尔可夫链。这种设置是合理的,因为马尔可夫噪声出现在许多上下文包括贝叶斯推断、强化学习和高维或组合空间中的随机优化中。然后,我们基于 DCA(凸差算法)设计了一种名为 Markov chain stochastic DCA(MCSDCA)的随机算法,这是一种用于非凸优化的知名方法。我们在渐进和非渐进意义上建立了收敛分析。然后,通过偏微分方程 (PDE) 正则化将 MCSDCA 应用于深度学习,基于过阻尼和欠阻尼朗之万动力学分别构建了两种 MCSDCA 的实现,即 MCSDCA-odLD 和 MCSDCA-udLD。具有各种神经网络拓扑结构的时间序列预测和图像分类问题的数值实验表明了所提出方法的优点。