Department of Psychology, Texas A&M University.
J Appl Psychol. 2014 May;99(3):535-45. doi: 10.1037/a0035788. Epub 2014 Feb 3.
As a testing method, the efficacy of situational judgment tests (SJTs) is a function of a number of design features. One such design feature is the response format. However, despite the considerable interest in SJT design features, there is little guidance in the extant literature as to which response format is superior or the conditions under which one might be preferable to others. Using an integrity-based SJT measure administered to 31,194 job applicants, we present a comparative evaluation of 3 response formats (rate, rank, and most/least) in terms of construct-related validity, subgroup differences, and score reliability. The results indicate that the rate-SJT displayed stronger correlations with the hypothesized personality traits; weaker correlations with general mental ability and, consequently, lower levels of subgroup differences; and higher levels of internal consistency reliability. A follow-up study with 492 college students (Study 2; details of which are presented in the online supplemental materials) also indicates that the rate response format displayed higher levels of internal consistency and retest reliability as well as favorable reactions from test takers. However, it displayed the strongest relationships with a measure of response distortion, suggesting that it is more susceptible to this threat. Although there were a few exceptions, the rank and most/least response formats were generally quite similar in terms of several of the study outcomes. The results suggest that in the context of SJTs designed to measure noncognitive constructs, the rate response format appears to be the superior, preferred response format, with its main drawback being that it is susceptible to response distortion, although not any more so than the rank response format.
作为一种测试方法,情境判断测验(SJTs)的功效是许多设计特征的函数。其中一个设计特征是响应格式。然而,尽管人们对 SJT 设计特征非常感兴趣,但在现有文献中几乎没有关于哪种响应格式更优越或在何种情况下哪种格式可能更优的指导。使用基于诚信的 SJT 测量方法对 31194 名求职者进行评估,我们根据与结构相关的有效性、子群体差异和分数可靠性,对 3 种响应格式(评分、排序和最多/最少)进行了比较评估。结果表明,评分-SJT 与假设的人格特质相关性更强;与一般智力能力的相关性较弱,因此子群体差异较低;以及更高的内部一致性可靠性。对 492 名大学生进行的后续研究(研究 2;详细信息见在线补充材料)也表明,评分响应格式具有更高的内部一致性和重测可靠性,以及测试者的良好反应。然而,它与反应失真的衡量标准表现出最强的关系,这表明它更容易受到这种威胁。尽管有一些例外,但在衡量非认知结构的 SJTs 中,排序和最多/最少的响应格式在几个研究结果方面通常非常相似。结果表明,在旨在衡量非认知结构的 SJT 中,评分响应格式似乎是更优越、更受欢迎的响应格式,其主要缺点是它容易受到反应失真的影响,尽管不比排序响应格式更容易受到影响。