IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1223-1236. doi: 10.1109/TPAMI.2016.2578323. Epub 2016 Jun 8.
In this paper, we study Stochastic Composite Optimization (SCO) for sparse learning that aims to learn a sparse solution from a composite function. Most of the recent SCO algorithms have already reached the optimal expected convergence rate O(1/λT), but they often fail to deliver sparse solutions at the end either due to the limited sparsity regularization during stochastic optimization (SO) or due to the limitation in online-to-batch conversion. Even when the objective function is strongly convex, their high probability bounds can only attain O(√{log(1/δ)/T}) with δ is the failure probability, which is much worse than the expected convergence rate. To address these limitations, we propose a simple yet effective two-phase Stochastic Composite Optimization scheme by adding a novel powerful sparse online-to-batch conversion to the general Stochastic Optimization algorithms. We further develop three concrete algorithms, OptimalSL, LastSL and AverageSL, directly under our scheme to prove the effectiveness of the proposed scheme. Both the theoretical analysis and the experiment results show that our methods can really outperform the existing methods at the ability of sparse learning and at the meantime we can improve the high probability bound to approximately O(log(log(T)/δ)/λT).
在本文中,我们研究了用于稀疏学习的随机复合优化(SCO),旨在从复合函数中学习稀疏解。最近的大多数 SCO 算法已经达到了最优的期望收敛速度 O(1/λT),但它们往往由于随机优化(SO)期间的有限稀疏正则化或由于在线到批量转换的限制,最终无法提供稀疏解。即使目标函数是强凸的,它们的高概率界也只能达到 O(√{log(1/δ)/T}),其中 δ 是失败概率,这比期望的收敛速度差得多。为了解决这些限制,我们通过向一般的 Stochastic Optimization 算法中添加新颖的强大的稀疏在线到批量转换,提出了一种简单而有效的两阶段 Stochastic Composite Optimization 方案。我们进一步开发了三种具体的算法,OptimalSL、LastSL 和 AverageSL,直接在我们的方案下证明了所提出方案的有效性。理论分析和实验结果均表明,我们的方法在稀疏学习能力方面确实优于现有方法,同时我们可以将高概率界提高到大约 O(log(log(T)/δ)/λT)。