IEEE Trans Neural Netw Learn Syst. 2015 Dec;26(12):3163-75. doi: 10.1109/TNNLS.2015.2406759. Epub 2015 Mar 10.
Multitask learning (MTL) aims at improving the generalization performance of multiple tasks by exploiting the shared factors among them. An important line of research in the MTL is the robust MTL (RMTL) methods, which use trace-norm regularization to capture task relatedness via a low-rank structure. The existing algorithms for the RMTL optimization problems rely on the accelerated proximal gradient (APG) scheme that needs repeated full singular value decomposition (SVD) operations. However, the time complexity of a full SVD is O(min(md(2),m(2)d)) for an RMTL problem with m tasks and d features, which becomes unaffordable in real-world MTL applications that often have a large number of tasks and high-dimensional features. In this paper, we propose a scalable solution for large-scale RMTL, with either the least squares loss or the squared hinge loss, by a divide-and-conquer method. The proposed method divides the original RMTL problem into several size-reduced subproblems, solves these cheaper subproblems in parallel by any base algorithm (e.g., APG) for RMTL, and then combines the results to obtain the final solution. Our theoretical analysis indicates that, with high probability, the recovery errors of the proposed divide-and-conquer algorithm are bounded by those of the base algorithm. Furthermore, in order to solve the subproblems with the least squares loss or the squared hinge loss, we propose two efficient base algorithms based on the linearized alternating direction method, respectively. Experimental results demonstrate that, with little loss of accuracy, our method is substantially faster than the state-of-the-art APG algorithms for RMTL.
多任务学习(MTL)旨在通过利用任务之间的共享因素来提高多个任务的泛化性能。MTL 中的一个重要研究方向是稳健的 MTL(RMTL)方法,它使用迹范数正则化通过低秩结构捕捉任务相关性。现有的 RMTL 优化问题算法依赖于加速近端梯度(APG)方案,该方案需要重复进行全奇异值分解(SVD)操作。然而,对于具有 m 个任务和 d 个特征的 RMTL 问题,全 SVD 的时间复杂度为 O(min(md(2),m(2)d)),在实际的 MTL 应用中,由于任务数量和高维特征的增加,这种方法变得难以承受。在本文中,我们提出了一种基于分治方法的大规模 RMTL 的可扩展解决方案,无论是最小二乘损失还是平方铰链损失。该方法将原始 RMTL 问题分解为几个规模较小的子问题,通过任何 RMTL 的基本算法(例如 APG)并行求解这些更便宜的子问题,然后将结果组合起来得到最终的解决方案。我们的理论分析表明,在很大概率下,所提出的分治算法的恢复误差受到基本算法的限制。此外,为了解决最小二乘损失或平方铰链损失的子问题,我们分别提出了两种基于线性交替方向法的高效基本算法。实验结果表明,在准确性损失很小的情况下,我们的方法比现有的用于 RMTL 的最先进的 APG 算法快得多。