Ruscio John, Mullen Tara
a The College of New Jersey.
Multivariate Behav Res. 2012 Mar 30;47(2):201-23. doi: 10.1080/00273171.2012.658329.
It is good scientific practice to the report an appropriate estimate of effect size and a confidence interval (CI) to indicate the precision with which a population effect was estimated. For comparisons of 2 independent groups, a probability-based effect size estimator (A) that is equal to the area under a receiver operating characteristic curve and closely related to the popular Wilcoxon-Mann-Whitney nonparametric statistical tests has many appealing properties (e.g., easy to understand, robust to violations of parametric assumptions, insensitive to outliers). We performed a simulation study to compare 9 analytic and 3 empirical (bootstrap) methods for constructing a CI for A that can yield very different CIs for the same data. The experimental design crossed 6 factors to yield a total of 324 cells representing challenging but realistic data conditions. Results were examined using several criteria, with emphasis placed on the extent to which observed CI coverage probabilities approximated nominal levels. Based on the simulation study results, the bias-corrected and accelerated bootstrap method is recommended for constructing a CI for the A statistic; bootstrap methods also provided the least biased and most accurate standard error of A. An empirical illustration examining score differences on a citation-based index of scholarly impact across faculty at low-ranked versus high-ranked research universities underscores the importance of choosing an appropriate CI method.
报告效应大小的适当估计值和置信区间(CI)以表明对总体效应估计的精确程度是良好的科学实践。对于两个独立组的比较,一种基于概率的效应大小估计器(A),它等于接受者操作特征曲线下的面积,并且与流行的Wilcoxon-Mann-Whitney非参数统计检验密切相关,具有许多吸引人的特性(例如,易于理解、对参数假设的违反具有稳健性、对异常值不敏感)。我们进行了一项模拟研究,以比较9种分析方法和3种经验(自助法)方法来构建A的置信区间,对于相同的数据,这些方法可能会产生非常不同的置信区间。实验设计交叉了6个因素,总共产生324个单元格,代表具有挑战性但现实的数据条件。使用几个标准检查结果,重点是观察到的置信区间覆盖概率接近名义水平的程度。基于模拟研究结果,建议使用偏差校正和加速自助法来构建A统计量的置信区间;自助法还提供了A的偏差最小且最准确的标准误差。一项实证例证考察了低排名与高排名研究型大学教师在基于引用的学术影响力指数上的得分差异,强调了选择合适的置信区间方法的重要性。