Suppr超能文献

随机拟梯度方法:通过雅可比矩阵草图实现方差缩减。

Stochastic quasi-gradient methods: variance reduction via Jacobian sketching.

作者信息

Gower Robert M, Richtárik Peter, Bach Francis

机构信息

LTCI, Telécom Paris, Institut Polytechnique de Paris, Palaiseau, France.

King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.

出版信息

Math Program. 2021;188(1):135-192. doi: 10.1007/s10107-020-01506-0. Epub 2020 May 12.

Abstract

We develop a new family of variance reduced stochastic gradient descent methods for minimizing the average of a very large number of smooth functions. Our method-JacSketch-is motivated by novel developments in randomized numerical linear algebra, and operates by maintaining a stochastic estimate of a Jacobian matrix composed of the gradients of individual functions. In each iteration, JacSketch efficiently updates the Jacobian matrix by first obtaining a random linear measurement of the true Jacobian through (cheap) sketching, and then projecting the previous estimate onto the solution space of a linear matrix equation whose solutions are consistent with the measurement. The Jacobian estimate is then used to compute a variance-reduced unbiased estimator of the gradient. Our strategy is analogous to the way quasi-Newton methods maintain an estimate of the Hessian, and hence our method can be seen as a . Our method can also be seen as stochastic gradient descent applied to a of the original problem, where the control comes from the Jacobian estimates. We prove that for smooth and strongly convex functions, JacSketch converges linearly with a meaningful rate dictated by a single convergence theorem which applies to general sketches. We also provide a refined convergence theorem which applies to a smaller class of sketches, featuring a novel proof technique based on a . This enables us to obtain sharper complexity results for variants of JacSketch with importance sampling. By specializing our general approach to specific sketching strategies, JacSketch reduces to the celebrated stochastic average gradient (SAGA) method, and its several existing and many new minibatch, reduced memory, and importance sampling variants. Our rate for SAGA with importance sampling is the current best-known rate for this method, resolving a conjecture by Schmidt et al. (Proceedings of the eighteenth international conference on artificial intelligence and statistics, AISTATS 2015, San Diego, California, 2015). The rates we obtain for minibatch SAGA are also superior to existing rates and are sufficiently tight as to show a decrease in total complexity as the minibatch size increases. Moreover, we obtain the first minibatch SAGA method with importance sampling.

摘要

我们开发了一类新的方差缩减随机梯度下降方法,用于最小化大量光滑函数的平均值。我们的方法——JacSketch,受到随机数值线性代数新进展的启发,通过维护一个由各个函数梯度组成的雅可比矩阵的随机估计来运行。在每次迭代中,JacSketch通过首先通过(廉价的)草图获得真实雅可比矩阵的随机线性测量,然后将先前的估计投影到线性矩阵方程的解空间上,该方程的解与测量一致,从而有效地更新雅可比矩阵。然后,雅可比估计用于计算梯度的方差缩减无偏估计器。我们的策略类似于拟牛顿方法维护海森矩阵估计的方式,因此我们的方法可以被视为一种……我们的方法也可以被视为应用于原始问题的一个……的随机梯度下降,其中控制来自雅可比估计。我们证明,对于光滑且强凸函数,JacSketch以由适用于一般草图的单个收敛定理所决定的有意义的速率线性收敛。我们还提供了一个适用于较小类草图的精细收敛定理,其具有基于一个……的新颖证明技术。这使我们能够为具有重要性采样的JacSketch变体获得更精确的复杂度结果。通过将我们的一般方法专门应用于特定的草图策略,JacSketch简化为著名的随机平均梯度(SAGA)方法,以及它的几个现有和许多新的小批量、减少内存和重要性采样变体。我们对于具有重要性采样的SAGA的速率是该方法目前已知的最佳速率,解决了施密特等人(2015年第十八届人工智能与统计国际会议论文集,AISTATS 2015,加利福尼亚州圣地亚哥,2015年)的一个猜想。我们为小批量SAGA获得的速率也优于现有速率,并且足够紧凑,以显示随着小批量大小的增加总复杂度的降低。此外,我们获得了第一个具有重要性采样的小批量SAGA方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/01d3/8550794/96cf66580763/10107_2020_1506_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验