Suppr超能文献

在不平衡多项设计中对最佳-最差数据进行评分,应用于众包语义判断。

Scoring best-worst data in unbalanced many-item designs, with applications to crowdsourcing semantic judgments.

机构信息

Department of Psychology, University of Alberta, P217 Biological Sciences Building, Edmonton, Alberta, T6G 2E9, Canada.

出版信息

Behav Res Methods. 2018 Apr;50(2):711-729. doi: 10.3758/s13428-017-0898-2.

Abstract

Best-worst scaling is a judgment format in which participants are presented with a set of items and have to choose the superior and inferior items in the set. Best-worst scaling generates a large quantity of information per judgment because each judgment allows for inferences about the rank value of all unjudged items. This property of best-worst scaling makes it a promising judgment format for research in psychology and natural language processing concerned with estimating the semantic properties of tens of thousands of words. A variety of different scoring algorithms have been devised in the previous literature on best-worst scaling. However, due to problems of computational efficiency, these scoring algorithms cannot be applied efficiently to cases in which thousands of items need to be scored. New algorithms are presented here for converting responses from best-worst scaling into item scores for thousands of items (many-item scoring problems). These scoring algorithms are validated through simulation and empirical experiments, and considerations related to noise, the underlying distribution of true values, and trial design are identified that can affect the relative quality of the derived item scores. The newly introduced scoring algorithms consistently outperformed scoring algorithms used in the previous literature on scoring many-item best-worst data.

摘要

最佳最差标度法是一种判断格式,参与者会看到一组项目,并需要在其中选择出更优和更差的项目。最佳最差标度法在每次判断中都会产生大量信息,因为每个判断都可以推断出所有未判断项目的等级值。最佳最差标度法的这一特性使其成为心理学和自然语言处理领域研究的一种很有前途的判断格式,这些研究涉及到对成千上万的单词的语义属性进行估计。在之前的最佳最差标度法文献中,已经设计了各种不同的评分算法。然而,由于计算效率的问题,这些评分算法不能有效地应用于需要对数千个项目进行评分的情况。本文提出了新的算法,用于将最佳最差标度法的响应转换为数千个项目的项目得分(多项目评分问题)。通过模拟和实证实验验证了这些评分算法,并确定了与噪声、真实值的基础分布以及试验设计相关的因素,这些因素会影响所得项目得分的相对质量。新引入的评分算法始终优于之前文献中用于多项目最佳最差数据评分的评分算法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验