Jiang Zhehan, Raymond Mark
The University of Alabama, Tuscaloosa, AL, USA.
National Board of Medical Examiners, Philadelphia, PA, USA.
Appl Psychol Meas. 2018 Nov;42(8):595-612. doi: 10.1177/0146621618758698. Epub 2018 Apr 3.
Conventional methods for evaluating the utility of subscores rely on reliability and correlation coefficients. However, correlations can overlook a notable source of variability: variation in subtest means/difficulties. Brennan introduced a reliability index for score profiles based on multivariate generalizability theory, designated as , which is sensitive to variation in subtest difficulty. However, there has been little, if any, research evaluating the properties of this index. A series of simulation experiments, as well as analyses of real data, were conducted to investigate under various conditions of subtest reliability, subtest correlations, and variability in subtest means. Three pilot studies evaluated in the context of a single group of examinees. Results of the pilots indicated that indices were typically low; across the 108 experimental conditions, ranged from .23 to .86, with an overall mean of 0.63. The findings were consistent with previous research, indicating that subscores often do not have interpretive value. Importantly, there were many conditions for which the correlation-based method known as proportion reduction in mean-square error (PRMSE; Haberman, 2006) indicated that subscores were worth reporting, but for which values of fell into the .50s, .60s, and .70s. The main study investigated within the context of score profiles for examinee subgroups. Again, not only indices were generally low, but it was also found that can be sensitive to subgroup differences when PRMSE is not. Analyses of real data and subsequent discussion address how can supplement PRMSE for characterizing the quality of subscores.
评估子分数效用的传统方法依赖于信度和相关系数。然而,相关性可能会忽略一个显著的变异性来源:子测验均值/难度的变化。布伦南基于多变量概化理论引入了一种用于分数剖面图的信度指数,记为 ,它对子测验难度的变化很敏感。然而,几乎没有(如果有的话)研究评估该指数的性质。进行了一系列模拟实验以及实际数据分析,以研究在子测验信度、子测验相关性和子测验均值变异性的各种条件下的 。三项预研究在一组考生的背景下评估了 。预研究结果表明 指数通常较低;在108个实验条件中, 范围从0.23到0.86,总体均值为0.63。这些发现与先前的研究一致,表明子分数往往没有解释价值。重要的是,在许多情况下,基于相关性的方法,即均方误差比例缩减法(PRMSE;哈伯曼,2006)表明子分数值得报告,但此时 值却处于0.50、0.60和0.70的范围。主要研究在考生亚组的分数剖面图背景下研究了 。同样,不仅 指数普遍较低,而且还发现当PRMSE不敏感时, 可能对亚组差异敏感。实际数据分析及后续讨论阐述了 如何补充PRMSE以刻画子分数的质量。