• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

项目反应理论模型突出了客观结构化临床考试中等级量表的评分和评分者-等级量表的交互作用。

Item response theory model highlighting rating scale of a rubric and rater-rubric interaction in objective structured clinical examination.

机构信息

Department of Computer and Network Engineering, The University of Electro-Communications, Chofu, Tokyo, Japan.

Institute of Education, Tokyo Medical and Dental University, Bunkyo-ku, Tokyo, Japan.

出版信息

PLoS One. 2024 Sep 6;19(9):e0309887. doi: 10.1371/journal.pone.0309887. eCollection 2024.

DOI:10.1371/journal.pone.0309887
PMID:39240906
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11379165/
Abstract

Objective structured clinical examinations (OSCEs) are a widely used performance assessment for medical and dental students. A common limitation of OSCEs is that the evaluation results depend on the characteristics of raters and a scoring rubric. To overcome this limitation, item response theory (IRT) models such as the many-facet Rasch model have been proposed to estimate examinee abilities while taking into account the characteristics of raters and evaluation items in a rubric. However, conventional IRT models have two impractical assumptions: constant rater severity across all evaluation items in a rubric and an equal interval rating scale among evaluation items, which can decrease model fitting and ability measurement accuracy. To resolve this problem, we propose a new IRT model that introduces two parameters: (1) a rater-item interaction parameter representing the rater severity for each evaluation item and (2) an item-specific step-difficulty parameter representing the difference in rating scales among evaluation items. We demonstrate the effectiveness of the proposed model by applying it to actual data collected from a medical interview test conducted at Tokyo Medical and Dental University as part of a post-clinical clerkship OSCE. The experimental results showed that the proposed model was well-fitted to our OSCE data and measured ability accurately. Furthermore, it provided abundant information on rater and item characteristics that conventional models cannot, helping us to better understand rater and item properties.

摘要

客观结构化临床考试(OSCE)是一种广泛用于医学生和牙科学员的绩效评估方法。OSCE 的一个常见局限性是,评估结果取决于评分者的特征和评分标准。为了克服这一局限性,已经提出了项目反应理论(IRT)模型,如多方面 Rasch 模型,以在考虑评分标准中评分者和评估项目的特征的情况下估计考生的能力。然而,传统的 IRT 模型有两个不切实际的假设:评分者在评分标准中的所有评估项目中的严格程度不变,以及评估项目之间的等距评分量表,这会降低模型拟合度和能力测量精度。为了解决这个问题,我们提出了一个新的 IRT 模型,该模型引入了两个参数:(1)代表每个评估项目的评分者-项目交互参数,以及(2)代表评估项目之间评分量表差异的项目特定步难度参数。我们通过将其应用于东京医科齿科大学在临床后实习 OSCE 中进行的医学面试测试中收集的实际数据来证明该模型的有效性。实验结果表明,该模型很好地适用于我们的 OSCE 数据,并且能够准确地测量能力。此外,它提供了有关评分者和项目特征的丰富信息,而传统模型无法提供这些信息,有助于我们更好地理解评分者和项目属性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bba7/11379165/3e25dc88d954/pone.0309887.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bba7/11379165/18301a19d21a/pone.0309887.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bba7/11379165/bf5f5ac68054/pone.0309887.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bba7/11379165/7583f31455a5/pone.0309887.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bba7/11379165/37ddbd1b0b8e/pone.0309887.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bba7/11379165/c2275fb6662c/pone.0309887.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bba7/11379165/3e25dc88d954/pone.0309887.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bba7/11379165/18301a19d21a/pone.0309887.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bba7/11379165/bf5f5ac68054/pone.0309887.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bba7/11379165/7583f31455a5/pone.0309887.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bba7/11379165/37ddbd1b0b8e/pone.0309887.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bba7/11379165/c2275fb6662c/pone.0309887.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bba7/11379165/3e25dc88d954/pone.0309887.g006.jpg

相似文献

1
Item response theory model highlighting rating scale of a rubric and rater-rubric interaction in objective structured clinical examination.项目反应理论模型突出了客观结构化临床考试中等级量表的评分和评分者-等级量表的交互作用。
PLoS One. 2024 Sep 6;19(9):e0309887. doi: 10.1371/journal.pone.0309887. eCollection 2024.
2
Accuracy of performance-test linking based on a many-facet Rasch model.基于多方面 Rasch 模型的绩效测试链接准确性。
Behav Res Methods. 2021 Aug;53(4):1440-1454. doi: 10.3758/s13428-020-01498-x. Epub 2020 Nov 9.
3
Holistic rubric vs. analytic rubric for measuring clinical performance levels in medical students.整体评分标准与分析性评分标准在医学生临床能力评估中的比较。
BMC Med Educ. 2018 Jun 5;18(1):124. doi: 10.1186/s12909-018-1228-9.
4
A Bayesian many-facet Rasch model with Markov modeling for rater severity drift.贝叶斯多项 RASCH 模型与马尔可夫建模用于评分者严重偏差。
Behav Res Methods. 2023 Oct;55(7):3910-3928. doi: 10.3758/s13428-022-01997-z. Epub 2022 Oct 25.
5
The presence and impact of local item dependence on objective structured clinical examinations scores and the potential use of the polytomous, many-facet Rasch model.局部项目依赖对客观结构化临床考试分数的影响及其存在情况,以及多值、多维度Rasch模型的潜在应用。
J Manipulative Physiol Ther. 2006 Oct;29(8):651-7. doi: 10.1016/j.jmpt.2006.08.002.
6
Linking essay-writing tests using many-facet models and neural automated essay scoring.运用多维模型和神经自动作文评分技术对作文考试进行关联。
Behav Res Methods. 2024 Dec;56(8):8450-8479. doi: 10.3758/s13428-024-02485-2. Epub 2024 Aug 20.
7
Item response theory: applications of modern test theory in medical education.项目反应理论:现代测试理论在医学教育中的应用。
Med Educ. 2003 Aug;37(8):739-45. doi: 10.1046/j.1365-2923.2003.01587.x.
8
Can disclosure of scoring rubric for basic clinical skills improve objective structured clinical examination?公开基本临床技能评分标准能否改善客观结构化临床考试?
Korean J Med Educ. 2016 Jun;28(2):179-83. doi: 10.3946/kjme.2016.28. Epub 2016 May 27.
9
Feasibility and reliability of the pandemic-adapted online-onsite hybrid graduation OSCE in Japan.日本大流行适应型线上-线下混合毕业客观结构化临床考试的可行性和可靠性。
Adv Health Sci Educ Theory Pract. 2024 Jul;29(3):949-965. doi: 10.1007/s10459-023-10290-3. Epub 2023 Oct 18.
10
Effect of moderation on rubric criteria for inter-rater reliability in an objective structured clinical examination with real patients.在针对真实患者的客观结构化临床考试中,适度性对评分者间信度的评分标准的影响。
Fujita Med J. 2022 Aug;8(3):83-87. doi: 10.20407/fmj.2021-010. Epub 2021 Nov 25.

本文引用的文献

1
A Bayesian many-facet Rasch model with Markov modeling for rater severity drift.贝叶斯多项 RASCH 模型与马尔可夫建模用于评分者严重偏差。
Behav Res Methods. 2023 Oct;55(7):3910-3928. doi: 10.3758/s13428-022-01997-z. Epub 2022 Oct 25.
2
Determining the influence of different linking patterns on the stability of students' score adjustments produced using Video-based Examiner Score Comparison and Adjustment (VESCA).确定不同链接模式对基于视频的考官评分比较和调整(VESCA)产生的学生分数调整稳定性的影响。
BMC Med Educ. 2022 Jan 17;22(1):41. doi: 10.1186/s12909-022-03115-1.
3
Using the Many-Facet Rasch Model to analyse and evaluate the quality of objective structured clinical examination: a non-experimental cross-sectional design.
运用多面Rasch模型分析和评估客观结构化临床考试的质量:一项非实验性横断面设计。
BMJ Open. 2019 Sep 6;9(9):e029208. doi: 10.1136/bmjopen-2019-029208.
4
Exploring the Combined Effects of Rater Misfit and Differential Rater Functioning in Performance Assessments.探索评分者不匹配和评分者差异功能在绩效评估中的综合影响。
Educ Psychol Meas. 2019 Oct;79(5):962-987. doi: 10.1177/0013164419834613. Epub 2019 Apr 2.
5
Trifactor Models for Multiple-Ratings Data.三因子模型在多评分数据中的应用
Multivariate Behav Res. 2019 May-Jun;54(3):360-381. doi: 10.1080/00273171.2018.1530091. Epub 2019 Mar 28.
6
Developing a video-based method to compare and adjust examiner effects in fully nested OSCEs.开发一种基于视频的方法,以比较和调整完全嵌套 OSCE 中的考官效应。
Med Educ. 2019 Mar;53(3):250-263. doi: 10.1111/medu.13783. Epub 2018 Dec 21.
7
Rater Model Using Signal Detection Theory for Latent Differential Rater Functioning.基于信号检测理论的潜在评分者功能差异的评分者模型。
Multivariate Behav Res. 2019 Jul-Aug;54(4):492-504. doi: 10.1080/00273171.2018.1522496. Epub 2018 Dec 17.
8
Estimating Optimal Weights for Compound Scores: A Multidimensional IRT Approach.估计复合分数的最优权重:多维IRT 方法。
Multivariate Behav Res. 2018 Nov-Dec;53(6):914-924. doi: 10.1080/00273171.2018.1478712. Epub 2018 Nov 21.
9
Simple Structure Detection Through Bayesian Exploratory Multidimensional IRT Models.基于贝叶斯探索性多维IRT 模型的简单结构检测。
Multivariate Behav Res. 2019 Jan-Feb;54(1):100-112. doi: 10.1080/00273171.2018.1496317. Epub 2018 Nov 7.
10
Using the Stan Program for Bayesian Item Response Theory.使用斯坦程序进行贝叶斯项目反应理论分析。
Educ Psychol Meas. 2018 Jun;78(3):384-408. doi: 10.1177/0013164417693666. Epub 2017 Feb 1.