Department of BioMechanical Engineering, Faculty of Mechanical, Maritime and Materials Engineering.
Department of Psychology, University of Texas.
Psychol Methods. 2016 Sep;21(3):273-90. doi: 10.1037/met0000079. Epub 2016 May 23.
The Pearson product–moment correlation coefficient () and the Spearman rank correlation coefficient () are widely used in psychological research. We compare and on 3 criteria: variability, bias with respect to the population value, and robustness to an outlier. Using simulations across low (N = 5) to high (N = 1,000) sample sizes we show that, for normally distributed variables, and have similar expected values but is more variable, especially when the correlation is strong. However, when the variables have high kurtosis, is more variable than . Next, we conducted a sampling study of a psychometric dataset featuring symmetrically distributed data with light tails, and of 2 Likert-type survey datasets, 1 with light-tailed and the other with heavy-tailed distributions. Consistent with the simulations, had lower variability than in the psychometric dataset. In the survey datasets with heavy-tailed variables in particular, had lower variability than , and often corresponded more accurately to the population Pearson correlation coefficient () than did. The simulations and the sampling studies showed that variability in terms of standard deviations can be reduced by about 20% by choosing instead of . In comparison, increasing the sample size by a factor of 2 results in a 41% reduction of the standard deviations of and . In conclusion, is suitable for light-tailed distributions, whereas is preferable when variables feature heavy-tailed distributions or when outliers are present, as is often the case in psychological research.
皮尔逊积矩相关系数()和斯皮尔曼等级相关系数()在心理研究中被广泛使用。我们将在 3 个标准上比较和:变异性、对总体值的偏差以及对异常值的稳健性。通过在低(N=5)到高(N=1000)样本量的模拟,我们表明,对于正态分布的变量,和具有相似的期望值,但更具变异性,尤其是当相关性较强时。然而,当变量具有高峰度时,比更具变异性。接下来,我们对一个具有对称分布数据和轻尾的心理计量数据集,以及 2 个李克特类型的调查数据集进行了抽样研究,其中一个具有轻尾,另一个具有重尾分布。与模拟结果一致,在心理计量数据集中,比更具变异性。特别是在具有重尾变量的调查数据集中,比更具变异性,并且通常比更准确地对应于总体皮尔逊相关系数()。模拟和抽样研究表明,通过选择而不是,可以将标准差的变异性降低约 20%。相比之下,将样本量增加两倍会导致和的标准差分别降低 41%。总之,适合轻尾分布,而当变量具有重尾分布或存在异常值时,更适合,这在心理研究中经常发生。