Langhan Melissa L, Tiyyagura Gunjan
Departments of Pediatrics and Emergency Medicine, Section of Pediatric Emergency Medicine, Yale University School of Medicine, New Haven, Conn 06510.
Departments of Pediatrics and Emergency Medicine, Section of Pediatric Emergency Medicine, Yale University School of Medicine, New Haven, Conn 06510.
Acad Pediatr. 2022 Mar;22(2):313-318. doi: 10.1016/j.acap.2021.11.018. Epub 2021 Dec 2.
No standardized evaluation tool for fellowship applicant assessment exists. Assessment tools are subject to biases and scoring tendencies which can skew scores and impact rankings. We aimed to develop and evaluate an objective assessment tool for fellowship applicants.
We detected rater effects in our numerically scaled assessment tool (NST), which consisted of 10 domains rated from 0 to 9. We evaluated each domain, consolidated redundant categories, and removed subjective categories. For 7 remaining domains, we described each quality and developed a question with a behaviorally-anchored rating scale (BARS). Applicants were rated by 6 attendings. Ratings from the NST in 2018 were compared with the BARS from 2020 for distribution of data, skewness, and inter-rater reliability.
Thirty-four applicants were evaluated with the NST and 38 with the BARS. Demographics were similar between groups. The median score on the NST was 8 out of 9; scores <5 were used in less than 1% of all evaluations. Distribution of data was improved in the BARS tool. In the NST, scores from 6 of 10 domains demonstrated moderate skewness and 3 high skewness. Three of the 7 domains in the BARS showed moderate skewness and none had high skewness. Two of 10 domains in the NST vs 5 of 7 domains in the BARS achieved good inter-rater reliability.
Replacing a standard numeric scale with a BARS normalized the distribution of data, reduced skewness, and enhanced inter-rater reliability in our evaluation tool. This provides some validity evidence for improved applicant assessment and ranking.
目前尚无用于评估专科住院医师培训申请人的标准化评估工具。评估工具容易受到偏见和评分倾向的影响,这可能会使分数产生偏差并影响排名。我们旨在开发并评估一种针对专科住院医师培训申请人的客观评估工具。
我们在数字评分评估工具(NST)中检测到评分者效应,该工具由10个领域组成,评分范围为0至9。我们对每个领域进行了评估,合并了冗余类别,并删除了主观类别。对于剩下的7个领域,我们描述了每个质量特征,并制定了带有行为锚定评分量表(BARS)的问题。6名主治医师对申请人进行评分。将2018年NST的评分与2020年BARS的评分进行比较,以分析数据分布、偏度和评分者间信度。
34名申请人接受了NST评估,38名接受了BARS评估。两组的人口统计学特征相似。NST的中位数分数为9分中的8分;所有评估中,分数低于5分的情况不到1%。BARS工具的数据分布得到了改善。在NST中,10个领域中有6个领域的分数呈现中度偏度,3个领域呈现高度偏度。BARS的7个领域中有3个领域呈现中度偏度,没有一个领域呈现高度偏度。NST的10个领域中有2个领域与BARS的7个领域中有5个领域实现了良好的评分者间信度。
在我们的评估工具中,用BARS取代标准数字量表使数据分布标准化,减少了偏度,并提高了评分者间信度。这为改进申请人评估和排名提供了一些有效性证据。