Wang Yue, Shojaie Ali, Randolph Timothy, Knight Parker, Ma Jing
Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus.
Department of Biostatistics, University of Washington.
Ann Appl Stat. 2023 Dec;17(4):2944-2969. doi: 10.1214/23-aoas1746. Epub 2023 Oct 30.
Motivated by emerging applications in ecology, microbiology, and neuroscience, this paper studies high-dimensional regression with two-way structured data. To estimate the high-dimensional coefficient vector, we propose the generalized matrix decomposition regression (GMDR) to efficiently leverage auxiliary information on row and column structures. GMDR extends the principal component regression (PCR) to two-way structured data, but unlike PCR, GMDR selects the components that are most predictive of the outcome, leading to more accurate prediction. For inference on regression coefficients of individual variables, we propose the generalized matrix decomposition inference (GMDI), a general high-dimensional inferential framework for a large family of estimators that include the proposed GMDR estimator. GMDI provides more flexibility for incorporating relevant auxiliary row and column structures. As a result, GMDI does not require the true regression coefficients to be sparse, but constrains the coordinate system representing the regression coefficients according to the column structure. GMDI also allows dependent and heteroscedastic observations. We study the theoretical properties of GMDI in terms of both the type-I error rate and power and demonstrate the effectiveness of GMDR and GMDI in simulation studies and an application to human microbiome data.
受生态学、微生物学和神经科学中新兴应用的推动,本文研究具有双向结构化数据的高维回归。为了估计高维系数向量,我们提出广义矩阵分解回归(GMDR),以有效利用行和列结构上的辅助信息。GMDR将主成分回归(PCR)扩展到双向结构化数据,但与PCR不同的是,GMDR选择对结果最具预测性的成分,从而实现更准确的预测。对于单个变量回归系数的推断,我们提出广义矩阵分解推断(GMDI),这是一个适用于包括所提出的GMDR估计器在内的一大类估计器的通用高维推断框架。GMDI在纳入相关辅助行和列结构方面提供了更大的灵活性。因此,GMDI不需要真实的回归系数是稀疏的,而是根据列结构约束表示回归系数的坐标系。GMDI还允许存在相依和异方差观测值。我们从一类错误率和检验功效两方面研究了GMDI的理论性质,并在模拟研究以及对人类微生物组数据的应用中证明了GMDR和GMDI的有效性。