Kong Dehan, An Baiguo, Zhang Jingwen, Zhu Hongtu
Department of Statistical Sciences, University of Toronto.
School of Statistics, Capital University of Economics and Business.
J Am Stat Assoc. 2020 Apr 30;115(529):403-424. doi: 10.1080/01621459.2018.1555092. Epub 2019 Apr 30.
The aim of this paper is to develop a low-rank linear regression model (L2RM) to correlate a high-dimensional response matrix with a high dimensional vector of covariates when coefficient matrices have low-rank structures. We propose a fast and efficient screening procedure based on the spectral norm of each coefficient matrix in order to deal with the case when the number of covariates is extremely large. We develop an efficient estimation procedure based on the trace norm regularization, which explicitly imposes the low rank structure of coefficient matrices. When both the dimension of response matrix and that of covariate vector diverge at the exponential order of the sample size, we investigate the sure independence screening property under some mild conditions. We also systematically investigate some theoretical properties of our estimation procedure including estimation consistency, rank consistency and non-asymptotic error bound under some mild conditions. We further establish a theoretical guarantee for the overall solution of our two-step screening and estimation procedure. We examine the finite-sample performance of our screening and estimation methods using simulations and a large-scale imaging genetic dataset collected by the Philadelphia Neurodevelopmental Cohort (PNC) study.
本文的目的是开发一种低秩线性回归模型(L2RM),以便在系数矩阵具有低秩结构时,将高维响应矩阵与高维协变量向量相关联。我们基于每个系数矩阵的谱范数提出了一种快速有效的筛选程序,以处理协变量数量极大的情况。我们开发了一种基于迹范数正则化的有效估计程序,该程序明确施加了系数矩阵的低秩结构。当响应矩阵的维度和协变量向量的维度都以样本量的指数阶发散时,我们在一些温和条件下研究了确定性独立筛选性质。我们还在一些温和条件下系统地研究了我们估计程序的一些理论性质,包括估计一致性、秩一致性和非渐近误差界。我们进一步为我们的两步筛选和估计程序的整体解建立了理论保证。我们使用模拟和由费城神经发育队列(PNC)研究收集的大规模成像遗传数据集,检验了我们筛选和估计方法的有限样本性能。