Test Development Center, Psychometrica, Dettelbach, Bavaria, Germany.
Wolfgang Lenhard, Institute of Psychology, Julius-Maximilians-University of Würzburg, Bavaria, Germany.
Behav Res Methods. 2024 Aug;56(5):4632-4642. doi: 10.3758/s13428-023-02207-0. Epub 2023 Aug 21.
Norm scores are an essential source of information in individual diagnostics. Given the scope of the decisions this information may entail, establishing high-quality, representative norms is of tremendous importance in test construction. Representativeness is difficult to establish, though, especially with limited resources and when multiple stratification variables and their joint probabilities come into play. Sample stratification requires knowing which stratum an individual belongs to prior to data collection, but the required variables for the individual's classification, such as socio-economic status or demographic characteristics, are often collected within the survey or test data. Therefore, post-stratification techniques, like iterative proportional fitting (= raking), aim at simulating representativeness of normative samples and can thus enhance the overall quality of the norm scores. This tutorial describes the application of raking to normative samples, the calculation of weights, the application of these weights in percentile estimation, and the retrieval of continuous, regression-based norm models with the cNORM package on the R platform. We demonstrate this procedure using a large, non-representative dataset of vocabulary development in childhood and adolescence (N = 4542), using sex and ethnical background as stratification variables.
常模分数是个体诊断的重要信息来源。鉴于这些信息可能涉及的决策范围,在测试构建中建立高质量、有代表性的常模是非常重要的。然而,代表性很难建立,尤其是在资源有限且多个分层变量及其联合概率发挥作用的情况下。样本分层需要在数据收集之前知道个体属于哪个阶层,但个体分类所需的变量,如社会经济地位或人口特征,通常是在调查或测试数据中收集的。因此,后分层技术,如迭代比例拟合(=耙),旨在模拟常模样本的代表性,从而提高常模分数的整体质量。本教程介绍了耙在常模样本中的应用、权重的计算、这些权重在百分位数估计中的应用,以及使用 R 平台上的 cNORM 包检索连续的、基于回归的常模模型。我们使用一个大型的、非代表性的儿童和青少年词汇发展数据集(N=4542)演示了这个过程,使用性别和种族背景作为分层变量。