Matlock Ki Lynn, Turner Ronna
Oklahoma State University, Stillwater, OK, USA.
University of Arkansas, Fayetteville, AR, USA.
Educ Psychol Meas. 2016 Apr;76(2):258-279. doi: 10.1177/0013164415589756. Epub 2015 Jun 9.
When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall. Manipulated variables were the number of items and average item difficulty within subsets of items primarily measuring one of two dimensions. Data sets were simulated at four levels of correlation (0, .3, .6, and .9). Item parameters were estimated using the Rasch and two-parameter logistic unidimensional item response theory models. Estimated discrimination and difficulty were compared across forms and to the true item parameters. The average unidimensional estimated discrimination was consistent across forms having the same correlation. Forms having a larger set of easy items measuring one dimension were estimated as being more difficult than forms having a larger set of hard items. Estimates were also investigated within subsets of items, and measures of bias were reported. This study encourages test developers to not only maintain consistent test specifications across forms as a whole but also within subcontent areas.
构建多个测试形式时,题目数量和总体测试难度通常是相等的。并非所有测试开发者都会使子内容领域内的题目数量和/或平均题目难度相匹配。在这项模拟研究中,构建了六个测试形式,它们的题目数量相等且总体平均题目难度相同。被操纵的变量是主要测量两个维度之一的题目子集中的题目数量和平均题目难度。数据集在四个相关水平(0、0.3、0.6和0.9)下进行模拟。使用拉施模型和两参数逻辑斯蒂单维题目反应理论模型估计题目参数。将估计的区分度和难度在不同形式之间进行比较,并与真实题目参数进行比较。在具有相同相关性的形式中,平均单维估计区分度是一致的。在测量一个维度时,拥有更多简单题目的形式比拥有更多难题目的形式被估计为更难。还在题目子集中对估计值进行了研究,并报告了偏差度量。这项研究鼓励测试开发者不仅要在整个形式上保持一致的测试规范,还要在子内容领域内保持一致。