Mohd Noh Muhamad Firdaus, Mohd Matore Mohd Effendi Ewan
Sekolah Rendah Agama Bersepadu Segamat, Johor, Malaysia.
Research Centre of Education Leadership and Policy, Faculty of Education, Universiti Kebangsaan Malaysia (UKM), Selangor, Malaysia.
Front Psychol. 2022 Jul 22;13:941084. doi: 10.3389/fpsyg.2022.941084. eCollection 2022.
Evaluating candidates' answers in speaking skill is difficult and rarely explored. This task is challenging and can bring inconsistency in the rating quality among raters, especially in speaking assessments. Severe raters will bring more harm than good to the results that candidates receive. Many-faceted Rasch measurement (MFRM) was used to explore the differences in teachers' rating severity based on their rating experience, training experience, and teaching experience. The research uses a quantitative approach and a survey method to enlist 164 English teachers who teach lower secondary school pupils, who were chosen through a multistage clustered sampling procedure. All the facets involving teachers, candidates, items, and domains were calibrated using MFRM. Every teacher scored six candidates' responses in a speaking test consisting of three question items, and they were evaluated across three domains, namely vocabulary, grammar, and communicative competence. Results highlight that the rating quality was different in terms of teachers' rating experience and teaching experience. However, training experience did not bring any difference to teachers' rating quality on speaking test. The evidence from this study suggests that the two main factors of teaching and rating experience must be considered when appointing raters for the speaking test. The quality of training must be improved to produce a rater with good professional judgment. Raters need to be supplied with answer samples with varied levels of candidates' performance to practice before becoming a good rater. Further research might explore any other rater bias that may impact the psychological well-being of certain groups of students.
评估候选人的口语技能答案既困难又鲜有人探究。这项任务具有挑战性,可能会导致评分者之间的评分质量不一致,尤其是在口语评估中。评分严格的评分者给候选人的成绩带来的坏处多于好处。多面Rasch测量法(MFRM)被用于探究教师基于其评分经验、培训经验和教学经验在评分严格程度上的差异。该研究采用定量方法和调查方法,通过多阶段整群抽样程序选取了164名教初中学生的英语教师。使用MFRM对涉及教师、候选人、题目和领域的所有方面进行校准。每位教师对由三个问题题目组成的口语测试中的六名候选人的回答进行评分,并在词汇、语法和交际能力这三个领域进行评估。结果表明,在教师的评分经验和教学经验方面,评分质量存在差异。然而,培训经验并未给教师在口语测试中的评分质量带来任何差异。这项研究的证据表明,在为口语测试指定评分者时,必须考虑教学经验和评分经验这两个主要因素。必须提高培训质量,以培养出具有良好专业判断力的评分者。在成为优秀评分者之前,需要为评分者提供具有不同水平候选人表现的答案样本以供练习。进一步的研究可能会探究任何其他可能影响特定学生群体心理健康的评分者偏差。