Chen Yakuan, Goldsmith Jeff, Ogden Todd
Department of Biostatistics, Mailman School of Public Health, Columbia University.
Stat (Int Stat Inst). 2016;5(1):88-101. doi: 10.1002/sta4.106. Epub 2016 Mar 2.
For regression models with functional responses and scalar predictors, it is common for the number of predictors to be large. Despite this, few methods for variable selection exist for function-on-scalar models, and none account for the inherent correlation of residual curves in such models. By expanding the coefficient functions using a -spline basis, we pose the function-on-scalar model as a multivariate regression problem. Spline coefficients are grouped within coefficient function, and group-minimax concave penalty (MCP) is used for variable selection. We adapt techniques from generalized least squares to account for residual covariance by "pre-whitening" using an estimate of the covariance matrix, and establish theoretical properties for the resulting estimator. We further develop an iterative algorithm that alternately updates the spline coefficients and covariance; simulation results indicate that this iterative algorithm often performs as well as pre-whitening using the true covariance, and substantially outperforms methods that neglect the covariance structure. We apply our method to two-dimensional planar reaching motions in a study of the effects of stroke severity on motor control, and find that our method provides lower prediction errors than competing methods.
对于具有函数响应和标量预测变量的回归模型,预测变量的数量通常很大。尽管如此,针对标量对函数模型的变量选择方法却很少,而且没有一种方法能考虑到此类模型中残差曲线的内在相关性。通过使用样条基展开系数函数,我们将标量对函数模型构建为一个多元回归问题。样条系数在系数函数内进行分组,并使用组极大极小凹惩罚(MCP)进行变量选择。我们采用广义最小二乘法中的技术,通过使用协方差矩阵的估计进行“预白化”来考虑残差协方差,并为所得估计量建立理论性质。我们进一步开发了一种迭代算法,该算法交替更新样条系数和协方差;模拟结果表明,这种迭代算法的性能通常与使用真实协方差进行预白化的效果相当,并且明显优于忽略协方差结构的方法。我们将我们的方法应用于一项关于中风严重程度对运动控制影响的二维平面伸手运动研究中,发现我们的方法比其他竞争方法提供更低的预测误差。