Zhou Pan, Yuan Xiaotong, Lin Zhouchen, Hoi Steven
IEEE Trans Pattern Anal Mach Intell. 2021 Jun 8;PP. doi: 10.1109/TPAMI.2021.3087328.
Despite the success of stochastic variance-reduced gradient (SVRG) algorithms in solving large-scale problems, their stochastic gradient complexity often scales linearly with data size and is expensive for huge data. Accordingly, we propose a hybrid stochastic-deterministic minibatch proximal gradient(HSDMPG) algorithm for strongly convex problems with linear prediction structure, e.g.least squares and logistic/softmax regression. HSDMPGenjoys improved computational complexity that is data-size-independent for large-scale problems. It iteratively samples an evolvingminibatch of individual losses to estimate the original problem, and efficiently minimizes the sampled smaller-sized subproblems. For strongly convex loss of n components, HSDMPGattains an ϵ-optimization-error within [Formula: see text] stochastic gradient evaluations, where κ is condition number, ζ = 1 for quadratic loss and ζ = 2 for generic loss. For large-scale problems, our complexity outperforms those of SVRG-type algorithms with/without dependence on data size. Particularly, when ϵ = O(1/√n) which matches the intrinsic excess error of a learning model and is sufficient for generalization, our complexity for quadratic and generic losses is respectively O (nlog(n)) and O (nlog(n)), which for the first time achieves optimal generalization in less than a single pass over data. Besides, we extend HSDMPGto online strongly convex problems and prove its higher efficiency over the prior algorithms. Numerical results demonstrate the computational advantages of~HSDMPG.