Kraus David
Department of Mathematics and Statistics, Masaryk University, Brno, Czechia.
J Appl Stat. 2024 Oct 28;52(6):1258-1277. doi: 10.1080/02664763.2024.2420223. eCollection 2025.
We revisit the classic situation in functional data analysis in which curves are observed at discrete, possibly sparse and irregular, arguments with observation noise. We focus on the reconstruction of individual curves by prediction intervals and bands. The standard approach consists of two steps: first, one estimates the mean and covariance function of curves and observation noise variance function by, e.g. penalized splines, and second, under Gaussian assumptions, one derives the conditional distribution of a curve given observed data and constructs prediction sets with required properties, usually employing sampling from the predictive distribution. This approach is well established, commonly used and theoretically valid but practically, it surprisingly fails in its key property: prediction sets constructed this way often do not have the required coverage. The actual coverage is lower than the nominal one. We investigate the cause of this issue and propose a computationally feasible remedy that leads to prediction regions with a much better coverage. Our method accounts for the uncertainty of the predictive model by sampling from the approximate distribution of its spline estimators whose covariance is estimated by a novel sandwich estimator. Our approach also applies to the important case of covariate-adjusted models.
我们重新审视功能数据分析中的经典情形,即在离散的、可能稀疏且不规则的自变量处观测曲线,并伴有观测噪声。我们专注于通过预测区间和带对个体曲线进行重构。标准方法包括两个步骤:首先,例如通过惩罚样条估计曲线的均值和协方差函数以及观测噪声方差函数;其次,在高斯假设下,推导给定观测数据时曲线的条件分布,并构建具有所需性质的预测集,通常从预测分布中进行抽样。这种方法已得到充分确立、广泛使用且在理论上是有效的,但实际上,它在关键性质上令人惊讶地失败了:以这种方式构建的预测集往往不具有所需的覆盖率。实际覆盖率低于名义覆盖率。我们研究了这个问题的原因,并提出了一种计算上可行的补救方法,该方法能得到覆盖率更好的预测区域。我们的方法通过从样条估计量的近似分布中抽样来考虑预测模型的不确定性,其协方差由一种新颖的三明治估计量来估计。我们的方法也适用于协变量调整模型的重要情形。