Lange Rense
Illinois State Board of Education, USA.
J Appl Meas. 2008;9(1):81-104.
Past research on Computer Adaptive Testing (CAT) has focused almost exclusively on the use of binary items and minimizing the number of items to be administrated. To address this situation, extensive computer simulations were performed using partial credit items with two, three, four, and five response categories. Other variables manipulated include the number of available items, the number of respondents used to calibrate the items, and various manipulations of respondents' true locations. Three item selection strategies were used, and the theoretically optimal Maximum Information method was compared to random item selection and Bayesian Maximum Falsification approaches. The Rasch partial credit model proved to be quite robust to various imperfections, and systematic distortions did occur mainly in the absence of sufficient numbers of items located near the trait or performance levels of interest. The findings further indicate that having small numbers of items is more problematic in practice than having small numbers of respondents to calibrate these items. Most importantly, increasing the number of response categories consistently improved CAT's efficiency as well as the general quality of the results. In fact, increasing the number of response categories proved to have a greater positive impact than did the choice of item selection method, as the Maximum Information approach performed only slightly better than the Maximum Falsification approach. Accordingly, issues related to the efficiency of item selection methods are far less important than is commonly suggested in the literature. However, being based on computer simulations only, the preceding presumes that actual respondents behave according to the Rasch model. CAT research could thus benefit from empirical studies aimed at determining whether, and if so, how, selection strategies impact performance.
过去关于计算机自适应测试(CAT)的研究几乎完全集中在二元项目的使用以及尽量减少要施测的项目数量上。为了解决这种情况,使用了具有两个、三个、四个和五个反应类别的部分计分项目进行了广泛的计算机模拟。其他被操纵的变量包括可用项目的数量、用于校准项目的受访者数量以及对受访者真实位置的各种操纵。使用了三种项目选择策略,并将理论上最优的最大信息方法与随机项目选择和贝叶斯最大证伪方法进行了比较。结果表明,拉施部分计分模型对各种不完善之处相当稳健,系统偏差主要发生在缺乏足够数量位于感兴趣的特质或表现水平附近的项目时。研究结果还表明,在实践中,项目数量少比校准这些项目的受访者数量少更成问题。最重要的是,增加反应类别的数量持续提高了CAT的效率以及结果的总体质量。事实上,增加反应类别的数量被证明比选择项目选择方法产生的积极影响更大,因为最大信息方法仅比最大证伪方法表现略好。因此,与项目选择方法效率相关的问题远没有文献中通常认为的那么重要。然而,仅基于计算机模拟,上述内容假定实际受访者的行为符合拉施模型。因此,CAT研究可以从旨在确定选择策略是否以及如何影响表现的实证研究中受益。