Sun Xiaoxiao, Du Pang, Wang Xiao, Ma Ping
Department of Statistics, University of Georgia.
Department of Statistics, Virginia Tech.
J Am Stat Assoc. 2018;113(524):1601-1611. doi: 10.1080/01621459.2017.1356320. Epub 2018 Jun 19.
Many scientific studies collect data where the response and predictor variables are both functions of time, location, or some other covariate. Understanding the relationship between these functional variables is a common goal in these studies. Motivated from two real-life examples, we present in this paper a function-on-function regression model that can be used to analyze such kind of functional data. Our estimator of the 2D coefficient function is the optimizer of a form of penalized least squares where the penalty enforces a certain level of smoothness on the estimator. Our first result is the Representer Theorem which states that the exact optimizer of the penalized least squares actually resides in a data-adaptive finite dimensional subspace although the optimization problem is defined on a function space of infinite dimensions. This theorem then allows us an easy incorporation of the Gaussian quadrature into the optimization of the penalized least squares, which can be carried out through standard numerical procedures. We also show that our estimator achieves the minimax convergence rate in mean prediction under the framework of function-on-function regression. Extensive simulation studies demonstrate the numerical advantages of our method over the existing ones, where a sparse functional data extension is also introduced. The proposed method is then applied to our motivating examples of the benchmark Canadian weather data and a histone regulation study.
许多科学研究收集的数据中,响应变量和预测变量都是时间、位置或其他一些协变量的函数。理解这些函数变量之间的关系是这些研究中的一个常见目标。基于两个实际例子,我们在本文中提出了一种函数对函数回归模型,可用于分析此类函数数据。我们对二维系数函数的估计器是一种惩罚最小二乘形式的优化器,其中惩罚项对估计器施加一定程度的平滑性。我们的第一个结果是表示定理,它表明惩罚最小二乘的精确优化器实际上位于一个数据自适应的有限维子空间中,尽管优化问题是在无限维的函数空间上定义的。该定理使我们能够轻松地将高斯求积法纳入惩罚最小二乘的优化中,这可以通过标准数值程序来实现。我们还表明,在函数对函数回归框架下,我们的估计器在平均预测中达到了极小极大收敛速率。大量的模拟研究证明了我们的方法相对于现有方法的数值优势,其中还引入了稀疏函数数据扩展。然后将所提出的方法应用于我们的激励示例,即加拿大基准天气数据和一项组蛋白调控研究。