Liu Liu, Liu Xuanqing, Hsieh Cho-Jui, Tao Dacheng
IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):1651-1663. doi: 10.1109/TNNLS.2023.3326177. Epub 2025 Jan 7.
Trust region (TR) and adaptive regularization using cubics (ARC) have proven to have some very appealing theoretical properties for nonconvex optimization by concurrently computing function value, gradient, and Hessian matrix to obtain the next search direction and the adjusted parameters. Although stochastic approximations help largely reduce the computational cost, it is challenging to theoretically guarantee the convergence rate. In this article, we explore a family of stochastic TR (STR) and stochastic ARC (SARC) methods that can simultaneously provide inexact computations of the Hessian matrix, gradient, and function values. Our algorithms require much fewer propagations overhead per iteration than TR and ARC. We prove that the iteration complexity to achieve -approximate second-order optimality is of the same order as the exact computations demonstrated in previous studies. In addition, the mild conditions on inexactness can be met by leveraging a random sampling technology in the finite-sum minimization problem. Numerical experiments with a nonconvex problem support these findings and demonstrate that, with the same or a similar number of iterations, our algorithms require less computational overhead per iteration than current second-order methods.
信赖域(TR)和基于三次函数的自适应正则化(ARC)已被证明在非凸优化中具有一些非常吸引人的理论性质,它们通过同时计算函数值、梯度和海森矩阵来获得下一个搜索方向和调整参数。尽管随机近似在很大程度上有助于降低计算成本,但从理论上保证收敛速度具有挑战性。在本文中,我们探索了一类随机信赖域(STR)和随机ARC(SARC)方法,它们可以同时提供海森矩阵、梯度和函数值的不精确计算。我们的算法在每次迭代中所需的传播开销比TR和ARC少得多。我们证明,实现近似二阶最优性的迭代复杂度与先前研究中精确计算的阶数相同。此外,通过在有限和最小化问题中利用随机采样技术,可以满足关于不精确性的温和条件。一个非凸问题的数值实验支持了这些发现,并表明在相同或相似的迭代次数下,我们的算法在每次迭代中所需的计算开销比当前的二阶方法少。