Suppr超能文献

调整住院医师评分的宽松或严格程度可提高麻醉科医师常规住院医师评估的可靠性。

Adjusting for Resident Rater Leniency or Severity Improves the Reliability of Routine Resident Evaluations of Faculty Anesthesiologists.

作者信息

Dexter Franklin, Vasilopoulos Terrie, Fahy Brenda G

机构信息

Anesthesia, University of Iowa, Iowa City, USA.

Anesthesiology/Orthopedics and Rehabilitation, University of Florida College of Medicine, Gainesville, USA.

出版信息

Cureus. 2025 Jun 19;17(6):e86366. doi: 10.7759/cureus.86366. eCollection 2025 Jun.

Abstract

Background The Accreditation Council for Graduate Medical Education (ACGME) of the United States requires all programs to evaluate faculty performance annually. Multiple universities require all faculty to be reviewed annually. These high-stakes evaluations should be reliable. When one anesthesiologist is said to perform better than another, there should be neither frequent Type I errors (i.e., an anesthesiologist is determined to perform better or worse than average when their performance is average) nor Type II errors (i.e., failure to detect above or below average performance) We investigated the generalizability of the finding that if adjustment is not made for rater leniency/severity, results will be statistically unreliable. Methods University of Florida 11-item evaluations were sent on Mondays, over the 2018-19 academic year. 108 ratees (anesthesiologists) had 3302 evaluations by 85 raters (resident physicians). The replicability of the results was assessed by making a comparison with previously published findings from the University of Toronto and the University of Iowa. Results As observed at the University of Toronto, there was greater heterogeneity of scores among raters than among ratees (raters' eta-squared 0.40; ratees' 0.22). As observed at the University of Iowa,the Florida rater leniency/severity of scores could not validly be modeled based on a normal distribution, because the distribution of each rater's mean among raters was not normally distributed (Shapiro-Wilk W = 0.90 (P = 0.00002) among the 75 raters with ≤9 evaluations). Likewise, matching Iowa,Florida's distribution of each ratee's mean among ratees was not normally distributed (W = 0.91 (P = 0.00001) among the 94 ratees with ≤9 evaluations). In contrast, treat evaluations with all items scored the maximum as having a value of 1, otherwise 0. As for Iowa,Florida's corresponding probability distributions of logits were normally distributed (W = 0.99 (P = 0.90) among raters and W = 0.98 (P = 0.09) among ratees, respectively). Rater leniency/severity remained large in the logit scale, with an intraclass correlation coefficient of 0.55. In the original scale, 0/108 ratees had performance that differed significantly from the grand mean of 4.63, using a P < 0.01 criterion. The alternative analysis approach adjusted for the raters' leniency/severity. Seven ratees were significantly below average (P ≤ 0.0048) and 17 above average (P ≤ 0.0086). Because statistical assumptions were satisfied, analysis in the original scale had a 22% (24/108) false negative rate, like the 21% observed previously at the University of Iowa. Conclusions Routine evaluations of faculty anesthesiologist ratees by anesthesiology resident raters give statistically unreliable results, falsely categorizing performance, unless analyses are adjusted for the covariates of raters. The need for adjustment found with the University of Florida data matches the need for this type of adjustment found at the University of Iowa and the University of Toronto Thus, this adjustment for raters' leniency/severity appears to be a general finding for rater/ratee routine evaluations.

摘要

背景 美国研究生医学教育认证委员会(ACGME)要求所有项目每年对教员绩效进行评估。多所大学要求对所有教员进行年度审查。这些高风险评估应该是可靠的。当称一位麻醉医生比另一位表现更好时,既不应频繁出现I型错误(即当麻醉医生的表现处于平均水平时,却被判定表现优于或劣于平均水平),也不应出现II型错误(即未能检测出高于或低于平均水平的表现)。我们调查了如下发现的普遍性:如果不对评分者的宽松/严格程度进行调整,结果在统计学上将不可靠。方法 在2018 - 19学年的周一发送了佛罗里达大学的11项评估。108名被评估者(麻醉医生)接受了85名评估者(住院医师)的3302次评估。通过与多伦多大学和爱荷华大学先前发表的结果进行比较,评估结果的可重复性。结果 正如在多伦多大学所观察到的,评估者之间分数的异质性大于被评估者之间的异质性(评估者的eta平方为0.40;被评估者的为0.22)。正如在爱荷华大学所观察到的,佛罗里达评估者分数的宽松/严格程度不能基于正态分布进行有效建模,因为在评估次数≤9次的75名评估者中,每个评估者均值在评估者中的分布不是正态分布(Shapiro - Wilk W = 0.90(P = 0.00002))。同样,与爱荷华大学情况相符,在评估次数≤9次的94名被评估者中,佛罗里达每个被评估者均值在被评估者中的分布也不是正态分布(W = 0.91(P = 0.00001))。相比之下,将所有项目都得最高分的评估视为值为1,否则为0。至于爱荷华大学,佛罗里达相应的对数概率分布是正态分布(评估者中W = 0.99(P = 0.90),被评估者中W = 0.98(P = 0.09))。在对数尺度上,评分者的宽松/严格程度仍然很大,组内相关系数为0.55。在原始尺度上,按照P < 0.01的标准,108名被评估者中没有人与4.63的总体均值有显著差异。替代分析方法对评估者的宽松/严格程度进行了调整。7名被评估者显著低于平均水平(P≤0.0048),17名高于平均水平(P≤0.0086)。由于满足统计假设,原始尺度上的分析有22%(24/108)的假阴性率,与先前在爱荷华大学观察到的21%类似。结论 麻醉学住院医师评估者对麻醉科教员被评估者的常规评估得出的统计结果不可靠,会错误地对表现进行分类,除非对评估者的协变量进行调整分析。佛罗里达大学数据中发现的调整需求与爱荷华大学和多伦多大学发现的此类调整需求相符。因此,对评估者宽松/严格程度的这种调整似乎是评估者/被评估者常规评估的一个普遍发现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/746c/12276026/cb3888ff3859/cureus-0017-00000086366-i01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验