Antonian Edward, Peters Gareth W, Chantler Michael
School of Mathematics and Computer Science, Heriot-Watt University, Edinburgh, United Kingdom.
Department of Statistics and Applied Probability, University of California Santa Barbara, Santa Barbara, California, United States of America.
PLoS One. 2025 May 30;20(5):e0324087. doi: 10.1371/journal.pone.0324087. eCollection 2025.
In this paper, we study a class of non-parametric regression models for predicting graph signals [Formula: see text] as a function of explanatory variables [Formula: see text]. Recently, Kernel Graph Regression (KGR) and Gaussian Processes over Graph (GPoG) have emerged as promising techniques for this task. The goal of this paper is to examine several extensions to KGR/GPoG, with the aim of generalising them a wider variety of data scenarios. The first extension we consider is the case of graph signals that have only been partially recorded, meaning a subset of their elements is missing at observation time. Next, we examine the statistical effect of correlated prediction error and propose a method for Generalized Least Squares (GLS) on graphs. In particular, we examine Autoregressive AR(1) vector autoregressive processes, which are commonly found in time-series applications. Finally, we use the Laplace approximation to determine a lower bound for the out-of-sample prediction error and derive a scalable expression for the marginal variance of each prediction. These methods are tested on both real and synthetic data, with the former taken from a network of air quality monitoring stations across California. We find evidence that the generalised GLS-KGR algorithm is well-suited to such time-series applications, outperforming several standard techniques on this dataset.
在本文中,我们研究了一类用于将图信号[公式:见原文]作为解释变量[公式:见原文]的函数进行预测的非参数回归模型。最近,核图回归(KGR)和图上的高斯过程(GPoG)已成为完成此任务的有前景的技术。本文的目标是研究对KGR/GPoG的几种扩展,目的是将它们推广到更广泛的数据场景。我们考虑的第一个扩展是仅部分记录的图信号的情况,这意味着在观测时其元素的一个子集缺失。接下来,我们研究相关预测误差的统计效应,并提出一种图上广义最小二乘法(GLS)。特别地,我们研究自回归AR(1)向量自回归过程,其在时间序列应用中很常见。最后,我们使用拉普拉斯近似来确定样本外预测误差的下界,并推导每个预测的边际方差的可扩展表达式。这些方法在真实数据和合成数据上都进行了测试,前者取自加利福尼亚州空气质量监测站的网络。我们发现有证据表明广义GLS-KGR算法非常适合此类时间序列应用,在此数据集上优于几种标准技术。