Tavassolipour Mostafa, Motahari Seyed Abolfazl, Shalmani Mohammad Taghi Manzuri
IEEE Trans Pattern Anal Mach Intell. 2020 Aug;42(8):1928-1941. doi: 10.1109/TPAMI.2019.2906207. Epub 2019 Mar 19.
It is of fundamental importance to find algorithms obtaining optimal performance for learning of statistical models in distributed and communication limited systems. Aiming at characterizing the optimal strategies, we consider learning of Gaussian Processes (GP) in distributed systems as a pivotal example. We first address a very basic problem: how many bits are required to estimate the inner-products of some Gaussian vectors across distributed machines? Using information theoretic bounds, we obtain an optimal solution for the problem which is based on vector quantization. Two suboptimal and more practical schemes are also presented as substitutes for the vector quantization scheme. In particular, it is shown that the performance of one of the practical schemes which is called per-symbol quantization is very close to the optimal one. Schemes provided for the inner-product calculations are incorporated into our proposed distributed learning methods for GPs. Experimental results show that with spending few bits per symbol in our communication scheme, our proposed methods outperform previous zero rate distributed GP learning schemes such as Bayesian Committee Model (BCM) and Product of experts (PoE).
在分布式和通信受限系统中,找到能使统计模型学习获得最优性能的算法至关重要。为了刻画最优策略,我们将分布式系统中的高斯过程(GP)学习作为一个关键示例来考虑。我们首先解决一个非常基本的问题:在分布式机器之间估计一些高斯向量的内积需要多少比特?利用信息论界,我们基于向量量化获得了该问题的最优解。还提出了两种次优且更实用的方案来替代向量量化方案。特别地,结果表明其中一种称为逐符号量化的实用方案的性能非常接近最优方案。为内积计算提供的方案被纳入我们提出的GP分布式学习方法中。实验结果表明,在我们的通信方案中每个符号只需花费很少的比特,我们提出的方法优于先前的零速率分布式GP学习方案,如贝叶斯委员会模型(BCM)和专家乘积(PoE)。