Kuiper Rebecca M, Nederhoff Tim, Klugkist Irene
Department of Methodology and Statistics, Utrecht University, The Netherlands.
Br J Math Stat Psychol. 2015 May;68(2):220-45. doi: 10.1111/bmsp.12041. Epub 2014 Jun 28.
In this paper, the performance of six types of techniques for comparisons of means is examined. These six emerge from the distinction between the method employed (hypothesis testing, model selection using information criteria, or Bayesian model selection) and the set of hypotheses that is investigated (a classical, exploration-based set of hypotheses containing equality constraints on the means, or a theory-based limited set of hypotheses with equality and/or order restrictions). A simulation study is conducted to examine the performance of these techniques. We demonstrate that, if one has specific, a priori specified hypotheses, confirmation (i.e., investigating theory-based hypotheses) has advantages over exploration (i.e., examining all possible equality-constrained hypotheses). Furthermore, examining reasonable order-restricted hypotheses has more power to detect the true effect/non-null hypothesis than evaluating only equality restrictions. Additionally, when investigating more than one theory-based hypothesis, model selection is preferred over hypothesis testing. Because of the first two results, we further examine the techniques that are able to evaluate order restrictions in a confirmatory fashion by examining their performance when the homogeneity of variance assumption is violated. Results show that the techniques are robust to heterogeneity when the sample sizes are equal. When the sample sizes are unequal, the performance is affected by heterogeneity. The size and direction of the deviations from the baseline, where there is no heterogeneity, depend on the effect size (of the means) and on the trend in the group variances with respect to the ordering of the group sizes. Importantly, the deviations are less pronounced when the group variances and sizes exhibit the same trend (e.g., are both increasing with group number).
本文考察了六种均值比较技术的性能。这六种技术源于所采用的方法(假设检验、使用信息准则的模型选择或贝叶斯模型选择)与所研究的假设集之间的区别(一组基于探索的经典假设,包含对均值的等式约束,或一组基于理论的有限假设,具有等式和/或顺序限制)。进行了一项模拟研究来考察这些技术的性能。我们证明,如果有特定的、先验指定的假设,验证(即研究基于理论的假设)比探索(即检验所有可能的等式约束假设)具有优势。此外,检验合理的顺序限制假设比仅评估等式限制更有能力检测真实效应/非零假设。此外,在研究多个基于理论的假设时,模型选择优于假设检验。由于前两个结果,我们通过检验当方差齐性假设被违反时它们的性能,进一步考察了能够以验证方式评估顺序限制的技术。结果表明,当样本量相等时,这些技术对异质性具有稳健性。当样本量不相等时,性能会受到异质性的影响。与不存在异质性的基线相比,偏差的大小和方向取决于(均值的)效应大小以及组方差相对于组大小排序的趋势。重要的是,当组方差和大小呈现相同趋势(例如,都随着组号增加)时,偏差不太明显。