Ke By Tracy, Jin Jiashun, Fan Jianqing
Princeton University and Carnegie Mellon University.
Ann Stat. 2014 Nov 1;42(6):2202-2242. doi: 10.1214/14-AOS1243.
Consider a linear model = β + , where = and ~ (0, ). The vector β is unknown and it is of interest to separate its nonzero coordinates from the zero ones (i.e., variable selection). Motivated by examples in long-memory time series (Fan and Yao, 2003) and the change-point problem (Bhattacharya, 1994), we are primarily interested in the case where the Gram matrix = ' is but by a finite order linear filter. We focus on the regime where signals are both so that successful variable selection is very challenging but is still possible. We approach this problem by a new procedure called the (CASE). CASE first uses a linear filtering to reduce the original setting to a new regression model where the corresponding Gram (covariance) matrix is sparse. The new covariance matrix induces a sparse graph, which guides us to conduct multivariate screening without visiting all the submodels. By interacting with the signal sparsity, the graph enables us to decompose the original problem into many separated small-size subproblems (if only we know where they are!). Linear filtering also induces a so-called problem of , which can be overcome by the newly introduced technique. Together, these give rise to CASE, which is a two-stage Screen and Clean (Fan and Song, 2010; Wasserman and Roeder, 2009) procedure, where we first identify candidates of these submodels by , and then re-examine each candidate to remove false positives. For any procedure β̂ for variable selection, we measure the performance by the minimax Hamming distance between the sign vectors of β̂ and β. We show that in a broad class of situations where the Gram matrix is non-sparse but sparsifiable, CASE achieves the optimal rate of convergence. The results are successfully applied to long-memory time series and the change-point model.
考虑一个线性模型(y = X\beta+\epsilon),其中(X = (x_1,\cdots,x_p))且(\epsilon\sim N(0,\sigma^2 I))。向量(\beta)是未知的,将其非零坐标与零坐标区分开(即变量选择)是我们感兴趣的问题。受长记忆时间序列中的例子(范剑青和姚期智,2003年)以及变点问题(巴塔查里亚,1994年)的启发,我们主要关注的情况是,Gram矩阵(G = X'X)是满秩的,但通过一个有限阶线性滤波器。我们关注信号既稀疏又密集的情况,使得成功的变量选择非常具有挑战性,但仍然是可能的。我们通过一种称为CASE(筛选与清理)的新方法来解决这个问题。CASE首先使用线性滤波将原始设置简化为一个新的回归模型,其中相应的Gram(协方差)矩阵是稀疏的。新的协方差矩阵诱导出一个稀疏图,它引导我们进行多变量筛选,而无需遍历所有子模型。通过与信号稀疏性相互作用,该图使我们能够将原始问题分解为许多分离的小尺寸子问题(前提是我们知道它们在哪里!)。线性滤波还会引发一个所谓的偏差问题,这可以通过新引入的重加权技术来克服。综合起来,这些产生了CASE,它是一种两阶段的筛选与清理(范剑青和宋立新,2010年;瓦瑟曼和罗德,2009年)方法,其中我们首先通过筛选识别这些子模型的候选者,然后重新检查每个候选者以去除误报。对于任何用于变量选择的过程(\hat{\beta}),我们通过(\hat{\beta})和(\beta)的符号向量之间的极小极大汉明距离来衡量性能。我们表明,在Gram矩阵非稀疏但可稀疏化的广泛情况下,CASE实现了最优收敛速度。这些结果成功地应用于长记忆时间序列和变点模型。