Pellegrino Gerardo, Saretzki Janika, Benedek Mathias
Department of General Psychology, University of Padova, 35131 Padova, Italy.
Department of Psychology, University of Graz, 8010 Graz, Austria.
J Intell. 2025 Jun 17;13(6):69. doi: 10.3390/jintelligence13060069.
Scoring divergent thinking (DT) tasks poses significant challenges as differences between raters affect the resulting scores. Item Response Theory (IRT) offers a statistical framework to handle differences in rater severity and discrimination. We applied the IRT framework by re-analysing an open access dataset including three scored DT tasks from 202 participants. After comparing different IRT models, we examined rater severity and discrimination parameters for individual response scoring and snapshot scoring using the best-fitting model-Graded Response Model. Secondly, we compared IRT-adjusted scores with non-adjusted average and max-scoring scores in terms of reliability and fluency confound effect. Additionally, we simulated missing data to assess the robustness of these approaches. Our results showed that IRT models can be applied to both individual response scoring and snapshot scoring. IRT-adjusted and unadjusted scores were highly correlated, indicating that, under conditions of high inter-rater agreement, rater variability in severity and discrimination does not substantially impact scores. Overall, our study confirms that IRT is a valuable statistical framework for modeling rater severity and discrimination for different DT scores, although further research is needed to clarify the conditions under which it offers the greatest practical benefit.
对发散性思维(DT)任务进行评分面临重大挑战,因为评分者之间的差异会影响最终得分。项目反应理论(IRT)提供了一个统计框架来处理评分者的严格程度和区分度差异。我们通过重新分析一个开放获取数据集来应用IRT框架,该数据集包含来自202名参与者的三项已评分的DT任务。在比较了不同的IRT模型后,我们使用最佳拟合模型——等级反应模型,检查了个体反应评分和快照评分的评分者严格程度和区分度参数。其次,我们在可靠性和流畅性混淆效应方面,将IRT调整后的分数与未调整的平均分和最高分进行了比较。此外,我们模拟了缺失数据以评估这些方法的稳健性。我们的结果表明,IRT模型可应用于个体反应评分和快照评分。IRT调整后的分数与未调整的分数高度相关,这表明,在评分者间一致性较高的情况下,评分者在严格程度和区分度上的差异不会对分数产生实质性影响。总体而言,我们的研究证实,IRT是一个用于对不同DT分数的评分者严格程度和区分度进行建模的有价值的统计框架,尽管还需要进一步研究以阐明在何种条件下它能提供最大的实际益处。