Heister Hannah M, Albers Casper J, Wiberg Marie, Timmerman Marieke E
Department Psychometrics and Statistics, University of Groningen.
Department of Statistics, USBE, Umea University.
Psychol Methods. 2024 Oct 14. doi: 10.1037/met0000686.
In norm-referenced psychological testing, an individual's performance is expressed in relation to a reference population using a standardized score, like an intelligence quotient score. The reference population can depend on a continuous variable, like age. Current continuous norming methods transform the raw score into an age-dependent standardized score. Such methods have the shortcoming to solely rely on the raw test scores, ignoring valuable information from individual item responses. Instead of modeling the raw test scores, we propose modeling the item scores with a Bayesian two-parameter logistic (2PL) item response theory model with age-dependent mean and variance of the latent trait distribution, 2PL-norm for short. Norms are then derived using the estimated latent trait score and the age-dependent distribution parameters. Simulations show that 2PL-norms are overall more accurate than those from the most popular raw score-based norming methods cNORM and generalized additive models for location, scale, and shape (GAMLSS). Furthermore, the credible intervals of 2PL-norm exhibit clearly superior coverage over the confidence intervals of the raw score-based methods. The only issue of 2PL-norm is its slightly lower performance at the tails of the norms. Among the raw score-based norming methods, GAMLSS outperforms cNORM. For empirical practice this suggests the use of 2PL-norm, if the model assumptions hold. If not, or the interest is solely in the point estimates of the extreme trait positions, GAMLSS-based norming is a better alternative. The use of the 2PL-norm is illustrated and compared with GAMLSS and cNORM using empirical data, and code is provided, so that users can readily apply 2PL-norm to their normative data. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
在常模参照心理测试中,个体的表现通过标准化分数(如智商分数)与参照群体相比较来体现。参照群体可以取决于一个连续变量,如年龄。当前的连续常模化方法将原始分数转换为与年龄相关的标准化分数。此类方法的缺点是仅依赖原始测试分数,而忽略了个体项目反应中的宝贵信息。我们建议,不是对原始测试分数进行建模,而是使用贝叶斯双参数逻辑斯蒂(2PL)项目反应理论模型对项目分数进行建模,该模型的潜在特质分布均值和方差与年龄相关,简称为2PL常模。然后使用估计的潜在特质分数和与年龄相关的分布参数得出常模。模拟结果表明,总体而言,2PL常模比最流行的基于原始分数的常模化方法cNORM和位置、尺度和形状广义相加模型(GAMLSS)更准确。此外,2PL常模的可信区间在覆盖范围上明显优于基于原始分数的方法的置信区间。2PL常模唯一的问题是在常模的尾部其表现略低。在基于原始分数的常模化方法中,GAMLSS优于cNORM。对于实证实践而言,如果模型假设成立,这表明应使用2PL常模。如果不成立,或者仅关注极端特质位置的点估计,基于GAMLSS的常模化是更好的选择。文中使用实证数据说明了2PL常模的使用,并将其与GAMLSS和cNORM进行了比较,还提供了代码,以便用户可以轻松地将2PL常模应用于他们的常模数据。(《心理学文摘数据库记录》(c)2025美国心理学会,保留所有权利)