Hodges Cooper B, Stone Bryant M, Johnson Paula K, Carter James H, Sawyers Chelsea K, Roby Patricia R, Lindsey Hannah M
Department of Neurology, University of Utah School of Medicine, Salt Lake City, UT, USA.
Department of Psychology, Brigham Young University, Provo, UT, USA.
Behav Res Methods. 2023 Sep;55(6):2813-2837. doi: 10.3758/s13428-022-01932-2. Epub 2022 Aug 11.
Researcher degrees of freedom can affect the results of hypothesis tests and consequently, the conclusions drawn from the data. Previous research has documented variability in accuracy, speed, and documentation of output across various statistical software packages. In the current investigation, we conducted Pearson's chi-square test of independence, Spearman's rank-ordered correlation, Kruskal-Wallis one-way analysis of variance, Wilcoxon Mann-Whitney U rank-sum tests, and Wilcoxon signed-rank tests, along with estimates of skewness and kurtosis, on large, medium, and small samples of real and simulated data in SPSS, SAS, Stata, and R and compared the results with those obtained through hand calculation using the raw computational formulas. Multiple inconsistencies were found in the results produced between statistical packages due to algorithmic variation, computational error, and statistical output. The most notable inconsistencies were due to algorithmic variations in the computation of Pearson's chi-square test conducted on 2 × 2 tables, where differences in p-values reported by different software packages ranged from .005 to .162, largely as a function of sample size. We discuss how such inconsistencies may influence the conclusions drawn from the results of statistical analyses depending on the statistical software used, and we urge researchers to analyze their data across multiple packages to check for inconsistencies and report details regarding the statistical procedure used for data analysis.
研究者自由度会影响假设检验的结果,进而影响从数据得出的结论。先前的研究记录了不同统计软件包在准确性、速度和输出记录方面的差异。在当前的调查中,我们在SPSS、SAS、Stata和R软件中,对大、中、小样本的真实数据和模拟数据进行了Pearson卡方独立性检验、Spearman等级相关分析、Kruskal-Wallis单因素方差分析、Wilcoxon Mann-Whitney U秩和检验以及Wilcoxon符号秩检验,并对偏度和峰度进行了估计,然后将结果与使用原始计算公式手工计算得到的结果进行比较。由于算法差异、计算误差和统计输出,在不同统计软件包产生的结果中发现了多个不一致之处。最显著的不一致之处在于对2×2列联表进行Pearson卡方检验时的算法差异,不同软件包报告的p值差异范围从0.005到0.162,这在很大程度上取决于样本量。我们讨论了这些不一致之处如何根据所使用的统计软件影响从统计分析结果中得出的结论,并敦促研究人员在多个软件包中分析他们的数据,以检查是否存在不一致之处,并报告用于数据分析的统计程序的详细信息。