Chen Jiajia, Zhang Xiaoqin, Hron Karel
School of Statistics, Shanxi University of Finance and Economics, Taiyuan, People's Republic of China.
Department of Mathematical Analysis and Applications of Mathematics, Faculty of Science, Palacký University, Olomouc, Czech Republic.
J Appl Stat. 2020 Jul 22;48(16):3130-3149. doi: 10.1080/02664763.2020.1795813. eCollection 2021.
The common approach for regression analysis with compositional variables is to express compositions in log-ratio coordinates (coefficients) and then perform standard statistical processing in real space. Similar to working in real space, the problem is that the standard least squares regression fails when the number of parts of all compositional covariates is higher than the number of observations. The aim of this study is to analyze in detail the partial least squares (PLS) regression which can deal with this problem. In this paper, we focus on the PLS regression between more than one compositional response variable and more than one compositional covariate. First, we give the PLS regression model with log-ratio coordinates of compositional variables, then we express the PLS model directly in the simplex. We also prove that the PLS model is invariant under the change of coordinate system, such as the ilr coordinates with a different contrast matrix or the clr coefficients. Moreover, we give the estimation and inference for parameters in PLS model. Finally, the PLS model with clr coefficients is used to analyze the relationship between the chemical metabolites of Astragali Radix and the plasma metabolites of rat after giving Astragali Radix.
处理成分变量回归分析的常用方法是在对数比坐标(系数)中表示成分,然后在实空间中进行标准统计处理。与在实空间中工作类似,问题在于当所有成分协变量的份数高于观测数时,标准最小二乘回归会失效。本研究的目的是详细分析能够处理此问题的偏最小二乘(PLS)回归。在本文中,我们关注多个成分响应变量与多个成分协变量之间的PLS回归。首先,我们给出具有成分变量对数比坐标的PLS回归模型,然后直接在单纯形中表示PLS模型。我们还证明了PLS模型在坐标系变化下是不变的,例如具有不同对比矩阵的ilr坐标或clr系数。此外,我们给出了PLS模型中参数的估计和推断。最后,使用具有clr系数的PLS模型分析黄芪给药后大鼠血浆代谢物与黄芪化学代谢物之间的关系。