Chakraborty Antik, Bhattacharya Anirban, Mallick Bani K
Department of Statistics, Texas A&M University, College Station, Texas, 77843, USA.
Biometrika. 2020 Mar;107(1):205-221. doi: 10.1093/biomet/asz056. Epub 2019 Nov 23.
We develop a Bayesian methodology aimed at simultaneously estimating low-rank and row-sparse matrices in a high-dimensional multiple-response linear regression model. We consider a carefully devised shrinkage prior on the matrix of regression coefficients which obviates the need to specify a prior on the rank, and shrinks the regression matrix towards low-rank and row-sparse structures. We provide theoretical support to the proposed methodology by proving minimax optimality of the posterior mean under the prediction risk in ultra-high dimensional settings where the number of predictors can grow sub-exponentially relative to the sample size. A one-step post-processing scheme induced by group lasso penalties on the rows of the estimated coefficient matrix is proposed for variable selection, with default choices of tuning parameters. We additionally provide an estimate of the rank using a novel optimization function achieving dimension reduction in the covariate space. We exhibit the performance of the proposed methodology in an extensive simulation study and a real data example.
我们开发了一种贝叶斯方法,旨在同时估计高维多响应线性回归模型中的低秩和行稀疏矩阵。我们考虑在回归系数矩阵上精心设计的收缩先验,这避免了指定秩先验的需要,并将回归矩阵收缩为低秩和行稀疏结构。我们通过证明在超高维设置下后验均值在预测风险下的极小极大最优性,为所提出的方法提供理论支持,在这种设置中预测变量的数量相对于样本大小可以呈亚指数增长。针对变量选择,提出了一种由估计系数矩阵行上的组套索惩罚诱导的一步后处理方案,并给出了调优参数的默认选择。我们还使用一种新颖的优化函数提供秩估计,该函数在协变量空间中实现降维。我们在广泛的模拟研究和一个实际数据示例中展示了所提出方法的性能。