Josse Julie, Holmes Susan
Department of Statistics, Agrocampus Ouest - INRIA, Saclay Paris Sud University, France.
Department of Statistics, Stanford University, California, USA.
Stat Surv. 2016;10:132-167. doi: 10.1214/16-SS116. Epub 2016 Nov 17.
Simple correlation coefficients between two variables have been generalized to measure association between two matrices in many ways. Coefficients such as the RV coefficient, the distance covariance (dCov) coefficient and kernel based coefficients are being used by different research communities. Scientists use these coefficients to test whether two random vectors are linked. Once it has been ascertained that there is such association through testing, then a next step, often ignored, is to explore and uncover the association's underlying patterns. This article provides a survey of various measures of dependence between random vectors and tests of independence and emphasizes the connections and differences between the various approaches. After providing definitions of the coefficients and associated tests, we present the recent improvements that enhance their statistical properties and ease of interpretation. We summarize multi-table approaches and provide scenarii where the indices can provide useful summaries of heterogeneous multi-block data. We illustrate these different strategies on several examples of real data and suggest directions for future research.
两个变量之间的简单相关系数已被广泛推广,用于以多种方式度量两个矩阵之间的关联。不同的研究群体使用诸如RV系数、距离协方差(dCov)系数和基于核的系数等。科学家们使用这些系数来检验两个随机向量是否相关联。一旦通过检验确定存在这种关联,那么接下来一个常常被忽视的步骤就是探索并揭示这种关联的潜在模式。本文对随机向量之间各种相依性度量以及独立性检验进行了综述,并强调了各种方法之间的联系与差异。在给出系数及相关检验的定义之后,我们介绍了最近的改进,这些改进增强了它们的统计特性并便于解释。我们总结了多表方法,并给出了一些场景,在这些场景中这些指标可以为异构多块数据提供有用的汇总。我们在几个实际数据示例中说明了这些不同的策略,并提出了未来研究的方向。