MacCann Robert G
Oxford University Centre for Educational Assessment, University of Oxford, Oxford OX2 6PY, UK.
J Appl Meas. 2009;10(4):438-54.
Using real data comprising responses to both dichotomously scored and constructed response items, this paper shows how Rasch modeling may be used to facilitate standard-setting. The modeling uses Andrich's Extended Logistic Model, which is incorporated into the RUMM software package. After a review of the fundamental equations of the model, an application to Bookmark standard setting is given, showing how to calculate the bookmark difficulty location (BDL) for both dichomotous items and tests containing a mixture of item types. An example showing how the bookmark is set is also discussed. The Rasch model is then applied in various ways to the Angoff standard-setting methods. In the first Angoff approach, the judges' item ratings are compared to Rasch model expected scores, allowing the judges to find items where their ratings differ significantly from the Rasch model values. In the second Angoff approach, the distribution of item ratings are converted to a distribution of possible cutscores, from which a final cutscore may be selected. In the third Angoff approach, the Rasch model provides a comprehensive information set to the judges. For every total score on the test, the model provides a column of item ratings (expected scores) for the ability associated with the total score. The judges consider each column of item ratings as a whole and select the column that best fits the expected pattern of responses of a marginal candidate. The total score corresponding to the selected column is then the performance band cutscore.
本文使用包含对二分计分项目和建构反应项目的回答的真实数据,展示了如何使用拉施模型来促进标准设定。该模型使用安德里奇的扩展逻辑斯蒂模型,该模型已被纳入RUMM软件包。在回顾了该模型的基本方程后,给出了一个书签标准设定的应用示例,展示了如何计算二分项目以及包含多种项目类型的测试的书签难度位置(BDL)。还讨论了一个展示如何设定书签的示例。然后,拉施模型以各种方式应用于安格夫标准设定方法。在第一种安格夫方法中,将评判员对项目的评分与拉施模型的预期分数进行比较,使评判员能够找到其评分与拉施模型值有显著差异的项目。在第二种安格夫方法中,将项目评分的分布转换为可能的分数线分布,从中可以选择最终的分数线。在第三种安格夫方法中,拉施模型为评判员提供了一个全面的信息集。对于测试中的每一个总分,该模型为与该总分相关的能力提供一列项目评分(预期分数)。评判员将每一列项目评分作为一个整体来考虑,并选择最符合边缘考生预期反应模式的那一列。与所选列相对应的总分即为表现带分数线。