Suppr超能文献

发散性思维评估中评分者效应的控制:个体反应与快照评分的项目反应理论方法

Controlling Rater Effects in Divergent Thinking Assessment: An Item Response Theory Approach to Individual Response and Snapshot Scoring.

作者信息

Pellegrino Gerardo, Saretzki Janika, Benedek Mathias

机构信息

Department of General Psychology, University of Padova, 35131 Padova, Italy.

Department of Psychology, University of Graz, 8010 Graz, Austria.

出版信息

J Intell. 2025 Jun 17;13(6):69. doi: 10.3390/jintelligence13060069.

Abstract

Scoring divergent thinking (DT) tasks poses significant challenges as differences between raters affect the resulting scores. Item Response Theory (IRT) offers a statistical framework to handle differences in rater severity and discrimination. We applied the IRT framework by re-analysing an open access dataset including three scored DT tasks from 202 participants. After comparing different IRT models, we examined rater severity and discrimination parameters for individual response scoring and snapshot scoring using the best-fitting model-Graded Response Model. Secondly, we compared IRT-adjusted scores with non-adjusted average and max-scoring scores in terms of reliability and fluency confound effect. Additionally, we simulated missing data to assess the robustness of these approaches. Our results showed that IRT models can be applied to both individual response scoring and snapshot scoring. IRT-adjusted and unadjusted scores were highly correlated, indicating that, under conditions of high inter-rater agreement, rater variability in severity and discrimination does not substantially impact scores. Overall, our study confirms that IRT is a valuable statistical framework for modeling rater severity and discrimination for different DT scores, although further research is needed to clarify the conditions under which it offers the greatest practical benefit.

摘要

对发散性思维(DT)任务进行评分面临重大挑战,因为评分者之间的差异会影响最终得分。项目反应理论(IRT)提供了一个统计框架来处理评分者的严格程度和区分度差异。我们通过重新分析一个开放获取数据集来应用IRT框架,该数据集包含来自202名参与者的三项已评分的DT任务。在比较了不同的IRT模型后,我们使用最佳拟合模型——等级反应模型,检查了个体反应评分和快照评分的评分者严格程度和区分度参数。其次,我们在可靠性和流畅性混淆效应方面,将IRT调整后的分数与未调整的平均分和最高分进行了比较。此外,我们模拟了缺失数据以评估这些方法的稳健性。我们的结果表明,IRT模型可应用于个体反应评分和快照评分。IRT调整后的分数与未调整的分数高度相关,这表明,在评分者间一致性较高的情况下,评分者在严格程度和区分度上的差异不会对分数产生实质性影响。总体而言,我们的研究证实,IRT是一个用于对不同DT分数的评分者严格程度和区分度进行建模的有价值的统计框架,尽管还需要进一步研究以阐明在何种条件下它能提供最大的实际益处。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a8b5/12194098/0bd6391a3a3d/jintelligence-13-00069-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验