Hecht Martin, Weirich Sebastian, Siegle Thilo, Frey Andreas
Humboldt-Universität zu Berlin, Berlin, Germany.
Friedrich Schiller University Jena, Jena, Germany.
Educ Psychol Meas. 2015 Aug;75(4):568-584. doi: 10.1177/0013164414554219. Epub 2014 Nov 3.
Multiple matrix designs are commonly used in large-scale assessments to distribute test items to students. These designs comprise several booklets, each containing a subset of the complete item pool. Besides reducing the test burden of individual students, using various booklets allows aligning the difficulty of the presented items to the assumed performance level of examined subgroups. While this may improve measurement precision and students' test-taking motivation, using several booklets might influence response behavior and thus constitute a potential source of unwanted variation. To provide guidance to identify and model booklet effects, this study presents statistical models accounting for booklet effects and applies these models in a large-scale assessment setting. Three models are derived from the Rasch model employing the generalized linear mixed models framework. The models were applied to data from a national educational standards assessment study for scientific competence. A total of 1,021 items were compiled to 74 booklets distributed to a sample of 9,044 students of Grades 9 and 10. The results revealed a small but nonnegligible booklet effect. For further large-scale assessment studies, it is recommended to examine whether booklet effects occur and to adequately account for them in the subsequent analyses where necessary.
在大规模评估中,多重矩阵设计通常用于向学生分发测试题目。这些设计包括几本小册子,每本包含完整题库的一个子集。除了减轻单个学生的测试负担外,使用不同的小册子还能使所呈现题目的难度与被测试子群体的假定表现水平相匹配。虽然这可能会提高测量精度和学生的应试动机,但使用几本小册子可能会影响答题行为,从而构成不必要变异的潜在来源。为了为识别和建模小册子效应提供指导,本研究提出了考虑小册子效应的统计模型,并将这些模型应用于大规模评估环境中。三个模型是从采用广义线性混合模型框架的拉施模型推导出来的。这些模型被应用于一项关于科学能力的国家教育标准评估研究的数据。总共1021道题目被编成74本小册子,分发给9044名九年级和十年级学生的样本。结果显示存在一个虽小但不可忽视的小册子效应。对于进一步的大规模评估研究,建议检查是否存在小册子效应,并在后续分析中必要时对其进行充分考虑。