Berns Fabian, Hüwel Jan, Beecks Christian
University of Hagen, Hagen, Germany.
Fraunhofer Institute for Applied Information Technology FIT, Sankt Augustin, Germany.
SN Comput Sci. 2022;3(4):300. doi: 10.1007/s42979-022-01186-x. Epub 2022 May 21.
Gaussian process models (GPMs) are widely regarded as a prominent tool for learning statistical data models that enable interpolation, regression, and classification. These models are typically instantiated by a Gaussian Process with a zero-mean function and a radial basis covariance function. While these default instantiations yield acceptable analytical quality in terms of model accuracy, GPM inference algorithms automatically search for an application-specific model fitting a particular dataset. State-of-the-art methods for automated inference of GPMs are searching the space of possible models in a rather intricate way and thus result in super-quadratic computation time complexity for model selection and evaluation. Since these properties only enable processing small datasets with low statistical versatility, various methods and algorithms using global as well as local approximations have been proposed for efficient inference of large-scale GPMs. While the latter approximation relies on representing data via local sub-models, global approaches capture data's inherent characteristics by means of an educated sample. In this paper, we investigate the current state-of-the-art in automated model inference for Gaussian processes and outline strengths and shortcomings of the respective approaches. A performance analysis backs our theoretical findings and provides further empirical evidence. It indicates that approximated inference algorithms, especially locally approximating ones, deliver superior runtime performance, while maintaining the quality level of those using non-approximative Gaussian processes.
高斯过程模型(GPMs)被广泛认为是用于学习统计数据模型的一种重要工具,这些模型能够进行插值、回归和分类。这些模型通常由具有零均值函数和径向基协方差函数的高斯过程实例化。虽然这些默认实例化在模型准确性方面产生了可接受的分析质量,但GPM推理算法会自动搜索适合特定数据集的特定应用模型。用于GPM自动推理的最先进方法正在以相当复杂的方式搜索可能模型的空间,因此在模型选择和评估方面导致超二次计算时间复杂度。由于这些特性仅允许处理具有低统计通用性的小数据集,因此已经提出了各种使用全局和局部近似的方法和算法来高效推理大规模GPM。虽然后者的近似依赖于通过局部子模型表示数据,但全局方法通过有根据的样本捕获数据的固有特征。在本文中,我们研究了高斯过程自动模型推理的当前技术水平,并概述了各自方法的优点和缺点。性能分析支持我们的理论发现并提供进一步的经验证据。结果表明,近似推理算法,尤其是局部近似算法,在保持使用非近似高斯过程的算法质量水平的同时,具有卓越的运行时性能。