草图核回归的分布式学习。

Distributed learning for sketched kernel regression.

机构信息

City University of Hong Kong Shenzhen Research Institute, Shenzhen, China; Department of Mathematics, City University of Hong Kong, Hong Kong, China.

Department of Mathematics, City University of Hong Kong, Hong Kong, China; School of Statistics, Renmin University of China, Beijing, China.

出版信息

Neural Netw. 2021 Nov;143:368-376. doi: 10.1016/j.neunet.2021.06.020. Epub 2021 Jun 25.

DOI:10.1016/j.neunet.2021.06.020

PMID:34217064

Abstract

We study distributed learning for regularized least squares regression in a reproducing kernel Hilbert space (RKHS). The divide-and-conquer strategy is a frequently used approach for dealing with very large data sets, which computes an estimate on each subset and then takes an average of the estimators. Existing theoretical constraint on the number of subsets implies the size of each subset can still be large. Random sketching can thus be used to produce the local estimators on each subset to further reduce the computation compared to vanilla divide-and-conquer. In this setting, sketching and divide-and-conquer are complementary to each other in dealing with the large sample size. We show that optimal learning rates can be retained. Simulations are performed to compare sketched and non-standard divide-and-conquer methods.

摘要

我们研究了在再生核希尔伯特空间（RKHS）中正则化最小二乘回归的分布式学习。分而治之策略是处理非常大数据集的常用方法，它在每个子集上计算一个估计值，然后对这些估计值取平均值。现有的关于子集数量的理论约束意味着每个子集的大小仍然可以很大。随机草图可以用于在每个子集中生成局部估计值，从而与普通的分而治之相比进一步减少计算量。在这种情况下，草图和分而治之在处理大样本量方面是互补的。我们证明了可以保留最优的学习率。进行了模拟来比较草图和非标准的分而治之方法。