Meng Cheng, Yu Jun, Chen Yongkai, Zhong Wenxuan, Ma Ping
Institute of Statistics and Big Data, Renmin University of China.
School of Mathematics and Statistics, Beijing Institute of Technology.
J Comput Graph Stat. 2022;31(3):802-812. doi: 10.1080/10618600.2021.2002161. Epub 2022 Jan 12.
Smoothing splines have been used pervasively in nonparametric regressions. However, the computational burden of smoothing splines is significant when the sample size is large. When the number of predictors ≥ 2 , the computational cost for smoothing splines is at the order of ( ) using the standard approach. Many methods have been developed to approximate smoothing spline estimators by using basis functions instead of ones, resulting in a computational cost of the order ( ). These methods are called the basis selection methods. Despite algorithmic benefits, most of the basis selection methods require the assumption that the sample is uniformly-distributed on a hyper-cube. These methods may have deteriorating performance when such an assumption is not met. To overcome the obstacle, we develop an efficient algorithm that is adaptive to the unknown probability density function of the predictors. Theoretically, we show the proposed estimator has the same convergence rate as the full-basis estimator when is roughly at the order of [ ] , where ∈[1, 2] and ≈ 4 are some constants depend on the type of the spline. Numerical studies on various synthetic datasets demonstrate the superior performance of the proposed estimator in comparison with mainstream competitors.
平滑样条已在非参数回归中被广泛使用。然而,当样本量很大时,平滑样条的计算负担很重。当预测变量的数量≥2时,使用标准方法,平滑样条的计算成本约为( )。已经开发了许多方法,通过使用基函数而不是( )来近似平滑样条估计量,从而产生了约为( )的计算成本。这些方法被称为基选择方法。尽管有算法优势,但大多数基选择方法都需要假设样本在超立方上均匀分布。当不满足这样的假设时,这些方法的性能可能会变差。为了克服这一障碍,我们开发了一种高效算法,该算法能适应预测变量未知的概率密度函数。从理论上讲,我们表明当( )大致为[ ]的量级时,所提出的估计量与全基估计量具有相同的收敛速度,其中∈[1, 2]且≈4是一些取决于样条类型的常数。在各种合成数据集上的数值研究表明,与主流竞争对手相比,所提出的估计量具有优越的性能。