Charles Perkins Centre, Central Clinical School, Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia.
Department of Epidemiology and Biostatistics, Indiana University School of Public Health-Bloomington, Bloomington, IN, USA.
Am J Clin Nutr. 2021 Mar 11;113(3):517-524. doi: 10.1093/ajcn/nqaa357.
The use of classic nonparametric tests (cNPTs), such as the Kruskal-Wallis and Mann-Whitney U tests, in the presence of unequal variance for between-group comparisons of means and medians may lead to marked increases in the rate of falsely rejecting null hypotheses and decreases in statistical power. Yet, this practice remains prevalent in the scientific literature, including nutrition and obesity literature. Some nutrition and obesity studies use a cNPT in the presence of unequal variance (i.e., heteroscedasticity), sometimes because of the mistaken rationale that the test corrects for heteroscedasticity. Herein, we discuss misconceptions of using cNPTs in the presence of heteroscedasticity. We then discuss assumptions, purposes, and limitations of 3 common tests used to test for mean differences between multiple groups, including 2 parametric tests: Fisher's ANOVA and Welch's ANOVA; and 1 cNPT: the Kruskal-Wallis test. To document the impact of heteroscedasticity on the validity of these tests under conditions similar to those used in nutrition and obesity research, we conducted simple simulations and assessed type I error rates (i.e., false positives, defined as incorrectly rejecting the null hypothesis). We demonstrate that type I error rates for Fisher's ANOVA, which does not account for heteroscedasticity, and Kruskal-Wallis, which tests for differences in distributions rather than means, deviated from the expected significance level. Greater deviation from the expected type I error rate was observed as the heterogeneity increased, especially in the presence of an imbalanced sample size. We provide brief tutorial guidance for authors, editors, and reviewers to identify appropriate statistical tests when test assumptions are violated, with a particular focus on cNPTs.
在组间均值和中位数比较中,当方差不等时,使用经典的非参数检验(cNPT),如 Kruskal-Wallis 和 Mann-Whitney U 检验,可能会导致显著增加错误拒绝零假设的比率和降低统计功效。然而,这种做法在科学文献中仍然很普遍,包括营养和肥胖文献。一些营养和肥胖研究在方差不等(即异方差)的情况下使用 cNPT,有时是因为错误的理由,即该检验纠正了异方差。在此,我们讨论了在存在异方差的情况下使用 cNPT 的误解。然后,我们讨论了用于检验多个组之间均值差异的 3 种常用检验的假设、目的和局限性,包括 2 种参数检验:Fisher 的 ANOVA 和 Welch 的 ANOVA;和 1 种 cNPT:Kruskal-Wallis 检验。为了记录在类似于营养和肥胖研究中使用的条件下,异方差对这些检验有效性的影响,我们进行了简单的模拟,并评估了 1 型错误率(即假阳性,定义为错误地拒绝零假设)。我们证明了不考虑异方差的 Fisher 的 ANOVA 和检验分布差异而不是均值的 Kruskal-Wallis 的 1 型错误率偏离了预期的显著水平。随着异方差的增加,尤其是在样本量不平衡的情况下,观察到的偏离预期 1 型错误率的程度更大。我们为作者、编辑和审稿人提供了简短的教程指南,以在违反检验假设时确定适当的统计检验,特别是 cNPT。