Buchholz Janine, Hartig Johannes
Deutsches Institut für Internationale Pädagogische Forschung, Frankfurt, Germany.
Appl Psychol Meas. 2019 May;43(3):241-250. doi: 10.1177/0146621617748323. Epub 2017 Dec 27.
Questionnaires for the assessment of attitudes and other psychological traits are crucial in educational and psychological research, and item response theory (IRT) has become a viable tool for scaling such data. Many international large-scale assessments aim at comparing these constructs across countries, and the invariance of measures across countries is thus required. In its most recent cycle, the Programme for International Student Assessment (PISA 2015) implemented an innovative approach for testing the invariance of IRT-scaled constructs in the context questionnaires administered to students, parents, school principals, and teachers. On the basis of a concurrent calibration with equal item parameters across all groups (i.e., languages within countries), a group-specific item-fit statistic (root mean square deviance [RMSD]) was used as a measure for the invariance of item parameters for individual groups. The present simulation study examines the statistic's distribution under different types and extents of (non)invariance in polytomous items. Responses to five 4-point Likert-type items were generated under the generalized partial credit model (GPCM) for 1,000 simulees in 50 groups each. For one of the five items, either location or discrimination parameters were drawn from a normal distribution. In addition to the type of noninvariance, the extent of noninvariance was varied by manipulating the variation of these distributions. The results indicate that the RMSD statistic is better at detecting noninvariance related to between-group differences in item location than in item discrimination. The study's findings may be used as a starting point to sensitivity analysis aiming to define cutoff values for determining (non)invariance.
用于评估态度和其他心理特质的问卷在教育和心理学研究中至关重要,项目反应理论(IRT)已成为对此类数据进行量表编制的可行工具。许多国际大规模评估旨在跨国比较这些构念,因此需要测量在各国间具有不变性。在其最近一轮评估中,国际学生评估项目(PISA 2015)采用了一种创新方法,用于在向学生、家长、学校校长和教师发放的背景问卷中测试IRT量表化构念的不变性。基于对所有组(即各国的语言群体)采用相等项目参数进行同步校准,使用特定组的项目拟合统计量(均方根偏差[RMSD])作为衡量各个组项目参数不变性的指标。本模拟研究考察了该统计量在多分类项目不同类型和程度的(非)不变性情况下的分布。在广义部分计分模型(GPCM)下,为50个组中每组1000名模拟对象生成了对五个4点李克特式项目的回答。对于五个项目中的一个,位置参数或区分度参数取自正态分布。除了非不变性的类型外,还通过操纵这些分布的变化来改变非不变性的程度。结果表明,RMSD统计量在检测与项目位置组间差异相关的非不变性方面比在检测项目区分度方面表现更好。该研究结果可作为敏感性分析的起点,旨在确定用于判定(非)不变性的临界值。