• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用个体拟合统计量检测评分者偏差:一项蒙特卡罗模拟研究。

Detecting rater bias using a person-fit statistic: a Monte Carlo simulation study.

机构信息

Université de Sherbrooke, Sherbrooke, Québec, Canada.

Université Laval, Québec, Québec, Canada.

出版信息

Perspect Med Educ. 2018 Apr;7(2):83-92. doi: 10.1007/s40037-017-0391-8.

DOI:10.1007/s40037-017-0391-8
PMID:29294255
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5889374/
Abstract

INTRODUCTION

With the Standards voicing concern for the appropriateness of response processes, we need to explore strategies that would allow us to identify inappropriate rater response processes. Although certain statistics can be used to help detect rater bias, their use is complicated by either a lack of data about their actual power to detect rater bias or the difficulty related to their application in the context of health professions education. This exploratory study aimed to establish the worthiness of pursuing the use of l to detect rater bias.

METHODS

We conducted a Monte Carlo simulation study to investigate the power of a specific detection statistic, that is: the standardized likelihood l person-fit statistics (PFS). Our primary outcome was the detection rate of biased raters, namely: raters whom we manipulated into being either stringent (giving lower scores) or lenient (giving higher scores), using the l statistic while controlling for the number of biased raters in a sample (6 levels) and the rate of bias per rater (6 levels).

RESULTS

Overall, stringent raters (M = 0.84, SD = 0.23) were easier to detect than lenient raters (M = 0.31, SD = 0.28). More biased raters were easier to detect then less biased raters (60% bias: 62, SD = 0.37; 10% bias: 43, SD = 0.36).

DISCUSSION

The PFS l seems to offer an interesting potential to identify biased raters. We observed detection rates as high as 90% for stringent raters, for whom we manipulated more than half their checklist. Although we observed very interesting results, we cannot generalize these results to the use of PFS with estimated item/station parameters or real data. Such studies should be conducted to assess the feasibility of using PFS to identify rater bias.

摘要

简介

随着标准对反应过程的适当性表示关注,我们需要探索能够识别不适当评分者反应过程的策略。虽然某些统计数据可用于帮助检测评分者偏差,但由于缺乏有关其实际检测评分者偏差能力的数据,或者由于其在医疗保健职业教育背景下应用的难度,这些数据的使用变得复杂。本探索性研究旨在确定使用 l 来检测评分者偏差的价值。

方法

我们进行了一项蒙特卡罗模拟研究,以调查特定检测统计量的功效,即:标准化似然 l 个体拟合统计量(PFS)。我们的主要结果是偏倚评分者的检测率,即:我们通过 l 统计量操纵为严格(给出较低分数)或宽松(给出较高分数)的评分者,同时控制样本中偏倚评分者的数量(6 个水平)和每个评分者的偏差率(6 个水平)。

结果

总体而言,严格的评分者(M=0.84,SD=0.23)比宽松的评分者(M=0.31,SD=0.28)更容易检测。更多的偏倚评分者比更少的偏倚评分者更容易检测(60%的偏差:62,SD=0.37;10%的偏差:43,SD=0.36)。

讨论

PFS l 似乎为识别有偏差的评分者提供了一个有趣的潜力。我们观察到严格评分者的检测率高达 90%,我们对其操纵了超过一半的检查表。尽管我们观察到了非常有趣的结果,但我们不能将这些结果推广到使用 PFS 估计项目/站参数或真实数据。应该进行此类研究,以评估使用 PFS 识别评分者偏差的可行性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9820/5889374/54082edd488e/40037_2017_391_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9820/5889374/54082edd488e/40037_2017_391_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9820/5889374/54082edd488e/40037_2017_391_Fig1_HTML.jpg

相似文献

1
Detecting rater bias using a person-fit statistic: a Monte Carlo simulation study.使用个体拟合统计量检测评分者偏差:一项蒙特卡罗模拟研究。
Perspect Med Educ. 2018 Apr;7(2):83-92. doi: 10.1007/s40037-017-0391-8.
2
Implicit versus explicit first impressions in performance-based assessment: will raters overcome their first impressions when learner performance changes?基于表现的评估中的内隐印象与外显印象:当学习者表现改变时,评价者会克服他们的第一印象吗?
Adv Health Sci Educ Theory Pract. 2024 Sep;29(4):1155-1168. doi: 10.1007/s10459-023-10302-2. Epub 2023 Nov 27.
3
Interrater Reliability of Standardized Actors Versus Nonactors in a Simulation Based Assessment of Interprofessional Collaboration.基于模拟的跨专业协作评估中标准化演员与非演员之间的评分者间信度
Simul Healthc. 2015 Aug;10(4):249-55. doi: 10.1097/SIH.0000000000000094.
4
Assessing rater performance without a "gold standard" using consensus theory.运用共识理论在没有“金标准”的情况下评估评分者的表现。
Med Decis Making. 1997 Jan-Mar;17(1):71-9. doi: 10.1177/0272989X9701700108.
5
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
6
Detection of Biased Rating of Medical Students by Standardized Patients: Opportunity for Improvement.标准化病人对医学生评分偏差的检测:改进的契机
Med Sci Educ. 2017 Sep;27(3):497-502. doi: 10.1007/s40670-017-0418-0. Epub 2017 Jun 5.
7
A method for identifying extreme OSCE examiners.一种识别极端客观结构化临床考试考官的方法。
Clin Teach. 2013 Feb;10(1):27-31. doi: 10.1111/j.1743-498X.2012.00607.x.
8
Are raters influenced by prior information about a learner? A review of assimilation and contrast effects in assessment.评分者会受到关于学习者的先验信息的影响吗?对评估中同化和对比效应的综述。
Adv Health Sci Educ Theory Pract. 2021 Aug;26(3):1133-1156. doi: 10.1007/s10459-021-10032-3. Epub 2021 Feb 10.
9
Does a Rater's Professional Background Influence Communication Skills Assessment?评分者的专业背景会影响沟通技能评估吗?
J Vet Med Educ. 2015 Winter;42(4):315-23. doi: 10.3138/jvme.0215-023R. Epub 2015 Aug 28.
10
Workplace-based assessment: raters' performance theories and constructs.基于工作场所的评估:评估者的绩效理论和构念。
Adv Health Sci Educ Theory Pract. 2013 Aug;18(3):375-96. doi: 10.1007/s10459-012-9376-x. Epub 2012 May 17.

引用本文的文献

1
Eleven ways to get a grip on the implementation of remote administration of high-stakes assessments.掌控高风险评估远程管理实施的十一种方法。
Can Med Educ J. 2022 Aug 26;13(4):3-7. doi: 10.36834/cmej.73734. eCollection 2022 Aug.