Bishara Anthony J, Hittner James B
College of Charleston, Charleston, SC, USA.
Educ Psychol Meas. 2015 Oct;75(5):785-804. doi: 10.1177/0013164414557639. Epub 2014 Nov 11.
It is more common for educational and psychological data to be nonnormal than to be approximately normal. This tendency may lead to bias and error in point estimates of the Pearson correlation coefficient. In a series of Monte Carlo simulations, the Pearson correlation was examined under conditions of normal and nonnormal data, and it was compared with its major alternatives, including the Spearman rank-order correlation, the bootstrap estimate, the Box-Cox transformation family, and a general normalizing transformation (i.e., rankit), as well as to various bias adjustments. Nonnormality caused the correlation coefficient to be inflated by up to +.14, particularly when the nonnormality involved heavy-tailed distributions. Traditional bias adjustments worsened this problem, further inflating the estimate. The Spearman and rankit correlations eliminated this inflation and provided conservative estimates. Rankit also minimized random error for most sample sizes, except for the smallest samples ( = 10), where bootstrapping was more effective. Overall, results justify the use of carefully chosen alternatives to the Pearson correlation when normality is violated.
教育和心理数据呈现非正态分布的情况比近似正态分布更为常见。这种趋势可能会导致皮尔逊相关系数点估计中的偏差和误差。在一系列蒙特卡洛模拟中,研究了正态和非正态数据条件下的皮尔逊相关性,并将其与主要替代方法进行了比较,包括斯皮尔曼等级相关、自助估计、Box-Cox变换族、一般正态化变换(即正态得分)以及各种偏差调整。非正态性会导致相关系数最多膨胀+.14,特别是当非正态性涉及重尾分布时。传统的偏差调整使这个问题更加严重,进一步夸大了估计值。斯皮尔曼相关和正态得分相关消除了这种膨胀并提供了保守估计。除了最小样本量(n = 10)时自助法更有效外,对于大多数样本量,正态得分还使随机误差最小化。总体而言,结果证明当违反正态性时,使用精心选择的皮尔逊相关替代方法是合理的。