Test Development Center, Psychometrica, Dettelbach, Bavaria, Germany.
Institute of Psychology, University of Wuerzburg, Bavaria, Germany.
PLoS One. 2019 Sep 17;14(9):e0222279. doi: 10.1371/journal.pone.0222279. eCollection 2019.
Continuous norming methods have seldom been subjected to scientific review. In this simulation study, we compared parametric with semi-parametric continuous norming methods in psychometric tests by constructing a fictitious population model within which a latent ability increases with age across seven age groups. We drew samples of different sizes (n = 50, 75, 100, 150, 250, 500 and 1,000 per age group) and simulated the results of an easy, medium, and difficult test scale based on Item Response Theory (IRT). We subjected the resulting data to different continuous norming methods and compared the data fit under the different test conditions with a representative cross-validation dataset of n = 10,000 per age group. The most significant differences were found in suboptimal (i.e., too easy or too difficult) test scales and in ability levels that were far from the population mean. We discuss the results with regard to the selection of the appropriate modeling techniques in psychometric test construction, the required sample sizes, and the requirement to report appropriate quantitative and qualitative test quality criteria for continuous norming methods in test manuals.
连续评分方法很少受到科学审查。在这项模拟研究中,我们通过在一个虚构的人群模型中构建一个潜在能力随年龄在七个年龄组中增加的模型,比较了心理测试中的参数和半参数连续评分方法。我们抽取了不同大小的样本(每组 50、75、100、150、250、500 和 1000 个),并根据项目反应理论(IRT)模拟了简单、中等和困难测试量表的结果。我们将得到的数据应用于不同的连续评分方法,并将不同测试条件下的数据拟合与每组年龄 10,000 个样本的代表性交叉验证数据集进行比较。在次优(即过于简单或过于困难)测试量表和远离人群平均值的能力水平方面,发现了最显著的差异。我们讨论了结果,涉及心理测试构建中适当建模技术的选择、所需的样本大小,以及测试手册中连续评分方法所需报告适当的定量和定性测试质量标准的要求。