de Sousa J, Hron K, Fačevicová K, Filzmoser P
Department of Mathematical Analysis and Applications of Mathematics, Faculty of Science, Palacký University, Olomouc, Czech Republic.
Institute of Statistics and Mathematical Methods in Economics, Vienna University of Technology, Vienna, Austria.
J Appl Stat. 2020 Feb 4;48(2):214-233. doi: 10.1080/02664763.2020.1722078. eCollection 2021.
A data table arranged according to two factors can often be considered a compositional table. An example is the number of unemployed people, split according to gender and age classes. Analyzed as compositions, the relevant information consists of ratios between different cells of such a table. This is particularly useful when analyzing several compositional tables jointly, where the absolute numbers are in very different ranges, e.g. if unemployment data are considered from different countries. Within the framework of the logratio methodology, compositional tables can be decomposed into independent and interactive parts, and orthonormal coordinates can be assigned to these parts. However, these coordinates usually require some prior knowledge about the data, and they are not easy to handle for exploring the relationships between the given factors. Here we propose a special choice of coordinates with direct relation to centered logratio (clr) coefficients, which are particularly useful for an interpretation in terms of the original cells of the tables. With these coordinates, robust principal component analysis (rPCA) is performed for dimension reduction, allowing to investigate relationships between the factors. The link between orthonormal coordinates and clr coefficients enables to apply rPCA, which would otherwise suffer from the singularity of clr coefficients.
根据两个因素排列的数据表通常可视为成分表。一个例子是失业人数,按性别和年龄组划分。作为成分进行分析时,相关信息由这样一个表中不同单元格之间的比率组成。在联合分析几个成分表时,这一点特别有用,因为此时绝对数字处于非常不同的范围,例如,如果考虑来自不同国家的失业数据。在对数比率方法的框架内,成分表可分解为独立部分和交互部分,并且可以为这些部分分配正交坐标。然而,这些坐标通常需要一些关于数据的先验知识,而且对于探索给定因素之间的关系来说,它们并不容易处理。在此,我们提出一种与中心对数比率(clr)系数直接相关的特殊坐标选择,这对于根据表的原始单元格进行解释特别有用。利用这些坐标,进行稳健主成分分析(rPCA)以进行降维,从而能够研究因素之间的关系。正交坐标与clr系数之间的联系使得能够应用rPCA,否则rPCA会受到clr系数奇异性的影响。