使用稀疏数据有效测量识别性能。

Efficiently measuring recognition performance with sparse data.

作者信息

Schooler Lael J, Shiffrin Richard M

机构信息

Center for Adaptive Behavior and Cognition, Max Planck Institute for Human Development, Berlin, Germany.

出版信息

Behav Res Methods. 2005 Feb;37(1):3-10. doi: 10.3758/bf03206393.

DOI:10.3758/bf03206393

PMID:16097339

Abstract

We examine methods for measuring performance in signal-detection-like tasks when each participant provides only a few observations. Monte Carlo simulations demonstrate that standard statistical techniques applied to a d' analysis can lead to large numbers of Type I errors (incorrectly rejecting a hypothesis of no difference). Various statistical methods were compared in terms of their Type I and Type II error (incorrectly accepting a hypothesis of no difference) rates. Our conclusions are the same whether these two types of errors are weighted equally or Type I errors are weighted more heavily. The most promising method is to combine an aggregate d' measure with a percentile bootstrap confidence interval, a computer-intensive nonparametric method of statistical inference. Researchers who prefer statistical techniques more commonly used in psychology, such as a repeated measures t test, should use gamma (Goodman & Kruskal, 1954), since it performs slightly better than or nearly as well as d'. In general, when repeated measures t tests are used, gamma is more conservative than d': It makes more Type II errors, but its Type I error rate tends to be much closer to that of the traditional .05 alpha level. It is somewhat surprising that gamma performs as well as it does, given that the simulations that generated the hypothetical data conformed completely to the d' model. Analyses in which H--FA was used had the highest Type I error rates. Detailed simulation results can be downloaded from www.psychonomic.org/archive/Schooler-BRM-2004.zip.

摘要

我们研究了在每个参与者仅提供少量观察数据的类似信号检测任务中衡量表现的方法。蒙特卡罗模拟表明，应用于d'分析的标准统计技术可能会导致大量的I类错误（错误地拒绝无差异假设）。我们根据I类和II类错误（错误地接受无差异假设）率对各种统计方法进行了比较。无论这两类错误是同等加权还是I类错误加权更重，我们的结论都是相同的。最有前景的方法是将综合d'度量与百分位数自助置信区间相结合，这是一种计算密集型的非参数统计推断方法。更喜欢心理学中更常用统计技术（如重复测量t检验）的研究人员应使用伽马（古德曼和克鲁斯卡，1954），因为它的表现略优于d'或与d'相近。一般来说，当使用重复测量t检验时，伽马比d'更保守：它会产生更多的II类错误，但其I类错误率往往更接近传统的.05显著性水平。鉴于生成假设数据的模拟完全符合d'模型，伽马的表现如此之好有点令人惊讶。使用H - FA的分析具有最高的I类错误率。详细的模拟结果可从www.psychonomic.org/archive/Schooler - BRM - 2004.zip下载。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用稀疏数据有效测量识别性能。

Efficiently measuring recognition performance with sparse data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

使用稀疏数据有效测量识别性能。

Efficiently measuring recognition performance with sparse data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献