The University of Electro-Communications, Tokyo, Japan.
Behav Res Methods. 2024 Dec;56(8):8450-8479. doi: 10.3758/s13428-024-02485-2. Epub 2024 Aug 20.
For essay-writing tests, challenges arise when scores assigned to essays are influenced by the characteristics of raters, such as rater severity and consistency. Item response theory (IRT) models incorporating rater parameters have been developed to tackle this issue, exemplified by the many-facet Rasch models. These IRT models enable the estimation of examinees' abilities while accounting for the impact of rater characteristics, thereby enhancing the accuracy of ability measurement. However, difficulties can arise when different groups of examinees are evaluated by different sets of raters. In such cases, test linking is essential for unifying the scale of model parameters estimated for individual examinee-rater groups. Traditional test-linking methods typically require administrators to design groups in which either examinees or raters are partially shared. However, this is often impractical in real-world testing scenarios. To address this, we introduce a novel method for linking the parameters of IRT models with rater parameters that uses neural automated essay scoring technology. Our experimental results indicate that our method successfully accomplishes test linking with accuracy comparable to that of linear linking using few common examinees.
对于论文写作测试,当评分者的特征(如评分者严厉性和一致性)影响到论文的评分时,就会出现挑战。已经开发了包含评分者参数的项目反应理论(IRT)模型来解决这个问题,多方面 Rasch 模型就是一个例子。这些 IRT 模型能够在考虑评分者特征影响的情况下估计考生的能力,从而提高能力测量的准确性。然而,当不同组的考生由不同的评分者进行评估时,可能会出现困难。在这种情况下,测试链接对于统一为个别考生-评分者群体估计的模型参数的规模是必不可少的。传统的测试链接方法通常要求管理员设计部分考生或评分者共享的组。然而,这在实际测试场景中往往是不切实际的。为了解决这个问题,我们引入了一种使用神经自动作文评分技术链接 IRT 模型参数和评分者参数的新方法。我们的实验结果表明,我们的方法成功地完成了测试链接,其准确性可与使用少量常见考生的线性链接相媲美。