Desai Neel, Baladandayuthapani Veerabhadran, Shinohara Russell T, Morris Jeffrey S
Division of Biostatistics, University of Pennsylvania.
Department of Biostatistics, University of Michigan - Ann Arbor.
J Comput Graph Stat. 2025;34(2):591-600. doi: 10.1080/10618600.2024.2407453. Epub 2024 Nov 22.
We propose a new method for the simultaneous selection and estimation of multivariate sparse additive models with correlated errors. Our method called Covariance Assisted Multivariate Penalized Additive Regression (CoMPAdRe) simultaneously selects among null, linear, and smooth non-linear effects for each predictor while incorporating joint estimation of the sparse residual structure among responses, with the motivation that accounting for inter-response correlation structure can lead to improved accuracy in variable selection and estimation efficiency. CoMPAdRe is constructed in a computationally efficient way that allows the selection and estimation of linear and non-linear covariates to be conducted in parallel across responses. Compared to single-response approaches that marginally select linear and non-linear covariate effects, we demonstrate in simulation studies that the joint multivariate modeling leads to gains in both estimation efficiency and selection accuracy, of greater magnitude in settings where signal is moderate relative to the level of noise. We apply our approach to protein-mRNA expression levels from multiple breast cancer pathways obtained from The Cancer Proteome Atlas and characterize both mRNA-protein associations and protein-protein subnetworks for each pathway. We find non-linear mRNA-protein associations for the Core Reactive, EMT, PIK-AKT, and RTK pathways. Supplementary Materials are available online.
我们提出了一种新方法,用于同时选择和估计具有相关误差的多元稀疏加性模型。我们的方法称为协方差辅助多元惩罚加性回归(CoMPAdRe),它在为每个预测变量同时在零效应、线性效应和平滑非线性效应之间进行选择的同时,纳入了对响应之间稀疏残差结构的联合估计,其动机是考虑响应间的相关结构可提高变量选择的准确性和估计效率。CoMPAdRe以一种计算高效的方式构建,使得线性和非线性协变量的选择和估计能够在各个响应之间并行进行。与逐次选择线性和非线性协变量效应的单响应方法相比,我们在模拟研究中表明,联合多元建模在估计效率和选择准确性方面都有提升,在信号相对于噪声水平适中的情况下提升幅度更大。我们将我们的方法应用于从癌症蛋白质组图谱获得的多个乳腺癌通路的蛋白质 - mRNA表达水平,并对每个通路的mRNA - 蛋白质关联和蛋白质 - 蛋白质子网进行表征。我们发现核心反应、上皮 - 间质转化、PI3K - AKT和受体酪氨酸激酶通路存在非线性mRNA - 蛋白质关联。补充材料可在线获取。