Yu Tengteng, Liu Xin-Wei, Dai Yu-Hong, Sun Jie
IEEE Trans Neural Netw Learn Syst. 2021 Oct;32(10):4627-4638. doi: 10.1109/TNNLS.2020.3025383. Epub 2021 Oct 5.
We consider the problem of minimizing the sum of an average of a large number of smooth convex component functions and a possibly nonsmooth convex function that admits a simple proximal mapping. This class of problems arises frequently in machine learning, known as regularized empirical risk minimization (ERM). In this article, we propose mSRGTR-BB, a minibatch proximal stochastic recursive gradient algorithm, which employs a trust-region-like scheme to select stepsizes that are automatically computed by the Barzilai-Borwein method. We prove that mSRGTR-BB converges linearly in expectation for strongly and nonstrongly convex objective functions. With proper parameters, mSRGTR-BB enjoys a faster convergence rate than the state-of-the-art minibatch proximal variant of the semistochastic gradient method (mS2GD). Numerical experiments on standard data sets show that the performance of mSRGTR-BB is comparable to and sometimes even better than mS2GD with best-tuned stepsizes and is superior to some modern proximal stochastic gradient methods.
最小化大量光滑凸分量函数的平均值与一个可能非光滑但具有简单近端映射的凸函数之和。这类问题在机器学习中经常出现,被称为正则化经验风险最小化(ERM)。在本文中,我们提出了mSRGTR - BB,一种小批量近端随机递归梯度算法,它采用类似信赖域的方案来选择由Barzilai - Borwein方法自动计算的步长。我们证明,对于强凸和非强凸目标函数,mSRGTR - BB在期望上线性收敛。在适当的参数下,mSRGTR - BB比半随机梯度方法(mS2GD)的最新小批量近端变体具有更快的收敛速度。在标准数据集上的数值实验表明,mSRGTR - BB的性能与具有最佳调谐步长的mS2GD相当,有时甚至更好,并且优于一些现代近端随机梯度方法。