中止规则对测试分数的心理计量学性质的影响。

Effects of Discontinue Rules on Psychometric Properties of Test Scores.

机构信息

National Board of Medical Examiners, 3750 Market Street, Philadelphia, PA, 19104-3102, USA.

American Institutes for Research, 1000 Thomas Jefferson Street, NW, Washington D.C., 20007, USA.

出版信息

Psychometrika. 2019 Mar;84(1):147-163. doi: 10.1007/s11336-018-09652-3. Epub 2019 Jan 3.

DOI:10.1007/s11336-018-09652-3

PMID:30607661

Abstract

This paper provides results on a form of adaptive testing that is used frequently in intelligence testing. In these tests, items are presented in order of increasing difficulty. The presentation of items is adaptive in the sense that a session is discontinued once a test taker produces a certain number of incorrect responses in sequence, with subsequent (not observed) responses commonly scored as wrong. The Stanford-Binet Intelligence Scales (SB5; Riverside Publishing Company, 2003) and the Kaufman Assessment Battery for Children (KABC-II; Kaufman and Kaufman, 2004), the Kaufman Adolescent and Adult Intelligence Test (Kaufman and Kaufman 2014) and the Universal Nonverbal Intelligence Test (2nd ed.) (Bracken and McCallum 2015) are some of the many examples using this rule. He and Wolfe (Educ Psychol Meas 72(5):808-826, 2012. https://doi.org/10.1177/0013164412441937 ) compared different ability estimation methods in a simulation study for this discontinue rule adaptation of test length. However, there has been no study, to our knowledge, of the underlying distributional properties based on analytic arguments drawing on probability theory, of what these authors call stochastic censoring of responses. The study results obtained by He and Wolfe (Educ Psychol Meas 72(5):808-826, 2012. https://doi.org/10.1177/0013164412441937 ) agree with results presented by DeAyala et al. (J Educ Meas 38:213-234, 2001) as well as Rose et al. (Modeling non-ignorable missing data with item response theory (IRT; ETS RR-10-11), Educational Testing Service, Princeton, 2010) and Rose et al. (Psychometrika 82:795-819, 2017. https://doi.org/10.1007/s11336-016-9544-7 ) in that ability estimates are biased most when scoring the not observed responses as wrong. This scoring is used operationally, so more research is needed in order to improve practice in this field. The paper extends existing research on adaptivity by discontinue rules in intelligence tests in multiple ways: First, an analytical study of the distributional properties of discontinue rule scored items is presented. Second, a simulation is presented that includes additional scoring rules and uses ability estimators that may be suitable to reduce bias for discontinue rule scored intelligence tests.

摘要

本文提供了一种常用于智力测试的自适应测试形式的结果。在这些测试中，项目按照难度递增的顺序呈现。测试的呈现是自适应的，因为一旦测试者连续出现一定数量的错误反应，测试就会停止，随后（未观察到的）反应通常被标记为错误。斯坦福-比奈智力量表（SB5；里弗赛德出版公司，2003 年）和考夫曼儿童评估成套测验（KABC-II；考夫曼和考夫曼，2004 年）、考夫曼成人和青少年智力测验（Kaufman and Kaufman 2014 年）和通用非言语智力测验（第 2 版）（Bracken 和 McCallum 2015 年）是许多使用此规则的例子。He 和 Wolfe（Educ Psychol Meas 72(5):808-826, 2012. https://doi.org/10.1177/0013164412441937）在一项关于测试长度的这种中断规则自适应的模拟研究中比较了不同的能力估计方法。然而，据我们所知，还没有基于概率论的分析论证来研究这些作者所称的反应随机截尾的基本分布特性的研究。He 和 Wolfe（Educ Psychol Meas 72(5):808-826, 2012. https://doi.org/10.1177/0013164412441937）的研究结果与 DeAyala 等人的研究结果一致（J Educ Meas 38:213-234, 2001）以及 Rose 等人的研究结果一致（使用项目反应理论（IRT）对不可忽略的缺失数据进行建模（ETS RR-10-11），教育测试服务，普林斯顿，2010 年）和 Rose 等人的研究结果一致（Psychometrika 82:795-819, 2017. https://doi.org/10.1007/s11336-016-9544-7），即当将未观察到的反应标记为错误时，能力估计的偏差最大。这种评分是在操作中使用的，因此需要更多的研究来改进这一领域的实践。本文通过多种方式扩展了智能测试中中断规则的适应性现有研究：首先，提出了一种对中断规则评分项目分布特性的分析研究。其次，提出了一个模拟，其中包括额外的评分规则，并使用可能适合减少中断规则评分的智力测试偏差的能力估计器。

相似文献

Effects of Discontinue Rules on Psychometric Properties of Test Scores.

Psychometrika. 2019 Mar;84(1):147-163. doi: 10.1007/s11336-018-09652-3. Epub 2019 Jan 3.

Variable-Length Stopping Rules for Multidimensional Computerized Adaptive Testing.

Psychometrika. 2019 Sep;84(3):749-771. doi: 10.1007/s11336-018-9644-7. Epub 2018 Dec 3.

[The estimation of premorbid intelligence levels in French speakers].

Encephale. 2005 Jan-Feb;31(1 Pt 1):31-43. doi: 10.1016/s0013-7006(05)82370-x.

Do the Kaufman tests of cognitive ability and academic achievement display construct bias across a representative sample of Black, Hispanic, and Caucasian school-age children in grades 1 through 12?

Psychol Assess. 2016 Aug;28(8):942-52. doi: 10.1037/pas0000236. Epub 2015 Oct 26.

Woodcock-Johnson-III, Kaufman Adolescent and Adult Intelligence Test (KAIT), Kaufman Assessment Battery for Children (KABC), and Differential Ability Scales (DAS) support Carroll but not Cattell-Horn.

Psychol Assess. 2017 Aug;29(8):1001-1015. doi: 10.1037/pas0000389. Epub 2016 Nov 10.

Modeling Conditional Dependence of Response Accuracy and Response Time with the Diffusion Item Response Theory Model.

Psychometrika. 2022 Jun;87(2):725-748. doi: 10.1007/s11336-021-09819-5. Epub 2022 Jan 6.

On the Treatment of Missing Item Responses in Educational Large-Scale Assessment Data: An Illustrative Simulation Study and a Case Study Using PISA 2018 Mathematics Data.

Eur J Investig Health Psychol Educ. 2021 Dec 14;11(4):1653-1687. doi: 10.3390/ejihpe11040117.

A cross-battery, reference variable, confirmatory factor analytic investigation of the CHC taxonomy.

J Sch Psychol. 2013 Aug;51(4):535-55. doi: 10.1016/j.jsp.2013.02.003. Epub 2013 Mar 22.

Exploratory Higher Order Analysis of the Luria Interpretive Model on the Kaufman Assessment Battery for Children-Second Edition (KABC-II) School-Age Battery.

Assessment. 2017 Jun;24(4):540-552. doi: 10.1177/1073191115614081. Epub 2015 Nov 23.

The choice of the ability estimate with asymptotically correct standardized person-fit statistics.

Br J Math Stat Psychol. 2016 May;69(2):175-93. doi: 10.1111/bmsp.12067. Epub 2016 Apr 5.

引用本文的文献

Item Response Theory Modeling of the Verb Naming Test.

J Speech Lang Hear Res. 2023 May 9;66(5):1718-1739. doi: 10.1044/2023_JSLHR-22-00458. Epub 2023 Mar 31.

An Adaptable, Open-Access Test Battery to Study the Fractionation of Executive-Functions in Diverse Populations.

Front Psychol. 2021 Mar 30;12:627219. doi: 10.3389/fpsyg.2021.627219. eCollection 2021.

本文引用的文献

Modeling Omitted and Not-Reached Items in IRT Models.

Psychometrika. 2016 Nov 15. doi: 10.1007/s11336-016-9544-7.

Tuning multiple imputation by predictive mean matching and local residual draws.

BMC Med Res Methodol. 2014 Jun 5;14:75. doi: 10.1186/1471-2288-14-75.

Statistical aspects of the analysis of data from retrospective studies of disease.

J Natl Cancer Inst. 1959 Apr;22(4):719-48.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

中止规则对测试分数的心理计量学性质的影响。

Effects of Discontinue Rules on Psychometric Properties of Test Scores.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献