基于多面Rasch测量分析，根据评分经验、培训经验和教学经验，探讨第二语言英语口语评估中评分者的评分差异。

Rater severity differences in English language as a second language speaking assessment based on rating experience, training experience, and teaching experience through many-faceted Rasch measurement analysis.

作者信息

Mohd Noh Muhamad Firdaus, Mohd Matore Mohd Effendi Ewan

机构信息

Sekolah Rendah Agama Bersepadu Segamat, Johor, Malaysia.

Research Centre of Education Leadership and Policy, Faculty of Education, Universiti Kebangsaan Malaysia (UKM), Selangor, Malaysia.

出版信息

Front Psychol. 2022 Jul 22;13:941084. doi: 10.3389/fpsyg.2022.941084. eCollection 2022.

DOI:10.3389/fpsyg.2022.941084

PMID:35936278

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9353031/

Abstract

Evaluating candidates' answers in speaking skill is difficult and rarely explored. This task is challenging and can bring inconsistency in the rating quality among raters, especially in speaking assessments. Severe raters will bring more harm than good to the results that candidates receive. Many-faceted Rasch measurement (MFRM) was used to explore the differences in teachers' rating severity based on their rating experience, training experience, and teaching experience. The research uses a quantitative approach and a survey method to enlist 164 English teachers who teach lower secondary school pupils, who were chosen through a multistage clustered sampling procedure. All the facets involving teachers, candidates, items, and domains were calibrated using MFRM. Every teacher scored six candidates' responses in a speaking test consisting of three question items, and they were evaluated across three domains, namely vocabulary, grammar, and communicative competence. Results highlight that the rating quality was different in terms of teachers' rating experience and teaching experience. However, training experience did not bring any difference to teachers' rating quality on speaking test. The evidence from this study suggests that the two main factors of teaching and rating experience must be considered when appointing raters for the speaking test. The quality of training must be improved to produce a rater with good professional judgment. Raters need to be supplied with answer samples with varied levels of candidates' performance to practice before becoming a good rater. Further research might explore any other rater bias that may impact the psychological well-being of certain groups of students.

摘要

评估候选人的口语技能答案既困难又鲜有人探究。这项任务具有挑战性，可能会导致评分者之间的评分质量不一致，尤其是在口语评估中。评分严格的评分者给候选人的成绩带来的坏处多于好处。多面Rasch测量法（MFRM）被用于探究教师基于其评分经验、培训经验和教学经验在评分严格程度上的差异。该研究采用定量方法和调查方法，通过多阶段整群抽样程序选取了164名教初中学生的英语教师。使用MFRM对涉及教师、候选人、题目和领域的所有方面进行校准。每位教师对由三个问题题目组成的口语测试中的六名候选人的回答进行评分，并在词汇、语法和交际能力这三个领域进行评估。结果表明，在教师的评分经验和教学经验方面，评分质量存在差异。然而，培训经验并未给教师在口语测试中的评分质量带来任何差异。这项研究的证据表明，在为口语测试指定评分者时，必须考虑教学经验和评分经验这两个主要因素。必须提高培训质量，以培养出具有良好专业判断力的评分者。在成为优秀评分者之前，需要为评分者提供具有不同水平候选人表现的答案样本以供练习。进一步的研究可能会探究任何其他可能影响特定学生群体心理健康的评分者偏差。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06bd/9353031/31d9369211d7/fpsyg-13-941084-g001.jpg

相似文献

Rater severity differences in English language as a second language speaking assessment based on rating experience, training experience, and teaching experience through many-faceted Rasch measurement analysis.

Front Psychol. 2022 Jul 22;13:941084. doi: 10.3389/fpsyg.2022.941084. eCollection 2022.

The raters' differences in Arabic writing rubrics through the Many-Facet Rasch measurement model.

Front Psychol. 2022 Dec 16;13:988272. doi: 10.3389/fpsyg.2022.988272. eCollection 2022.

Investigating a shared-dialect effect between raters and candidates in English speaking tests.

Front Psychol. 2023 Mar 30;14:1143031. doi: 10.3389/fpsyg.2023.1143031. eCollection 2023.

Examining Rater Judgements in Music Performance Assessment using Many-Facets Rasch Rating Scale Measurement Model.

J Appl Meas. 2019;20(1):79-99.

The Relationship of Special Education Teacher Performance on Observation Instruments With Student Outcomes.

J Learn Disabil. 2021 Jan/Feb;54(1):54-65. doi: 10.1177/0022219420908906. Epub 2020 Mar 18.

Do kindergarten teachers possess adequate knowledge of basic language constructs to teach children to read English as a foreign language?

Ann Dyslexia. 2020 Apr;70(1):79-93. doi: 10.1007/s11881-020-00197-8. Epub 2020 Apr 6.

Multi-faceted Rasch measurement and bias patterns in EFL writing performance assessment.

Psychol Rep. 2013 Apr;112(2):469-85. doi: 10.2466/03.11.PR0.112.2.469-485.

Using many-facet rasch measurement and generalizability theory to explore rater effects for direct behavior rating-multi-item scales.

Sch Psychol. 2023 Mar;38(2):119-128. doi: 10.1037/spq0000518. Epub 2022 Sep 29.

[A proposal for reforming psychologists' training in France and in the European Union].

Encephale. 2009 Feb;35(1):18-24. doi: 10.1016/j.encep.2007.11.008. Epub 2008 Apr 2.

Developing and Validating the English Teachers' Cognitions About Grammar Teaching Questionnaire (TCAGTQ) to Uncover Teacher Thinking.

Front Psychol. 2022 Jun 20;13:880408. doi: 10.3389/fpsyg.2022.880408. eCollection 2022.

引用本文的文献

A systematic literature review on analytical thinking development in mathematics education: trends across time and countries.

Front Psychol. 2025 Jun 26;16:1523836. doi: 10.3389/fpsyg.2025.1523836. eCollection 2025.

A cross-age odyssey of cognitive reading attributes: a scoping review.

Front Psychol. 2025 Jun 20;16:1605898. doi: 10.3389/fpsyg.2025.1605898. eCollection 2025.

The raters' differences in Arabic writing rubrics through the Many-Facet Rasch measurement model.

Front Psychol. 2022 Dec 16;13:988272. doi: 10.3389/fpsyg.2022.988272. eCollection 2022.

本文引用的文献

Using Generalizability Theory and Many-Facet Rasch Model to Evaluate In-Basket Tests for Managerial Positions.

Front Psychol. 2021 Jul 29;12:660553. doi: 10.3389/fpsyg.2021.660553. eCollection 2021.

Assessing Speaking Proficiency: A Narrative Review of Speaking Assessment Research Within the Argument-Based Validation Framework.

Front Psychol. 2020 Feb 27;11:330. doi: 10.3389/fpsyg.2020.00330. eCollection 2020.

On the complementarity of holistic and analytic approaches to performance assessment scoring.

Br J Educ Psychol. 2019 Sep;89(3):468-484. doi: 10.1111/bjep.12286. Epub 2019 Apr 19.

Subjective ratings of age-of-acquisition: exploring issues of validity and rater reliability.

J Child Lang. 2019 Mar;46(2):199-213. doi: 10.1017/S0305000918000363. Epub 2018 Oct 23.

Investigation of Rater Effects Using Social Network Analysis and Exponential Random Graph Models.

Educ Psychol Meas. 2018 Jun;78(3):430-459. doi: 10.1177/0013164416689696. Epub 2017 Feb 5.

Using Repeated Ratings to Improve Measurement Precision in Incomplete Rating Designs.

J Appl Meas. 2018;19(2):148-161.

Examiner error in curriculum-based measurement of oral reading.

J Sch Psychol. 2014 Aug;52(4):361-75. doi: 10.1016/j.jsp.2014.05.007. Epub 2014 Jun 25.

Multi-faceted Rasch measurement and bias patterns in EFL writing performance assessment.

Psychol Rep. 2013 Apr;112(2):469-85. doi: 10.2466/03.11.PR0.112.2.469-485.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于多面Rasch测量分析，根据评分经验、培训经验和教学经验，探讨第二语言英语口语评估中评分者的评分差异。

Rater severity differences in English language as a second language speaking assessment based on rating experience, training experience, and teaching experience through many-faceted Rasch measurement analysis.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献