Schooler Lael J, Shiffrin Richard M
Center for Adaptive Behavior and Cognition, Max Planck Institute for Human Development, Berlin, Germany.
Behav Res Methods. 2005 Feb;37(1):3-10. doi: 10.3758/bf03206393.
We examine methods for measuring performance in signal-detection-like tasks when each participant provides only a few observations. Monte Carlo simulations demonstrate that standard statistical techniques applied to a d' analysis can lead to large numbers of Type I errors (incorrectly rejecting a hypothesis of no difference). Various statistical methods were compared in terms of their Type I and Type II error (incorrectly accepting a hypothesis of no difference) rates. Our conclusions are the same whether these two types of errors are weighted equally or Type I errors are weighted more heavily. The most promising method is to combine an aggregate d' measure with a percentile bootstrap confidence interval, a computer-intensive nonparametric method of statistical inference. Researchers who prefer statistical techniques more commonly used in psychology, such as a repeated measures t test, should use gamma (Goodman & Kruskal, 1954), since it performs slightly better than or nearly as well as d'. In general, when repeated measures t tests are used, gamma is more conservative than d': It makes more Type II errors, but its Type I error rate tends to be much closer to that of the traditional .05 alpha level. It is somewhat surprising that gamma performs as well as it does, given that the simulations that generated the hypothetical data conformed completely to the d' model. Analyses in which H--FA was used had the highest Type I error rates. Detailed simulation results can be downloaded from www.psychonomic.org/archive/Schooler-BRM-2004.zip.
我们研究了在每个参与者仅提供少量观察数据的类似信号检测任务中衡量表现的方法。蒙特卡罗模拟表明,应用于d'分析的标准统计技术可能会导致大量的I类错误(错误地拒绝无差异假设)。我们根据I类和II类错误(错误地接受无差异假设)率对各种统计方法进行了比较。无论这两类错误是同等加权还是I类错误加权更重,我们的结论都是相同的。最有前景的方法是将综合d'度量与百分位数自助置信区间相结合,这是一种计算密集型的非参数统计推断方法。更喜欢心理学中更常用统计技术(如重复测量t检验)的研究人员应使用伽马(古德曼和克鲁斯卡,1954),因为它的表现略优于d'或与d'相近。一般来说,当使用重复测量t检验时,伽马比d'更保守:它会产生更多的II类错误,但其I类错误率往往更接近传统的.05显著性水平。鉴于生成假设数据的模拟完全符合d'模型,伽马的表现如此之好有点令人惊讶。使用H - FA的分析具有最高的I类错误率。详细的模拟结果可从www.psychonomic.org/archive/Schooler - BRM - 2004.zip下载。