Sebok Stefanie S, Syer Mark D
Acad Med. 2015 Nov;90(11 Suppl):S50-5. doi: 10.1097/ACM.0000000000000902.
Raters represent a significant source of unexplained, and often undesired, variance in performance-based assessments. To better understand rater variance, this study investigated how various raters, observing the same performance, perceived relationships amongst different noncognitive attributes measured in performance assessments.
Medical admissions data from a Multiple Mini-Interview (MMI) used at one Canadian medical school were collected and subsequently analyzed using the Many Facet Rasch Model (MFRM) and hierarchical clustering. This particular MMI consisted of eight stations. At each station a faculty member and an upper-year medical student rated applicants on various noncognitive attributes including communication, critical thinking, effectiveness, empathy, integrity, maturity, professionalism, and resolution.
The Rasch analyses revealed differences between faculty and student raters across the eight different MMI stations. These analyses also identified that, at times, raters were unable to distinguish between the various noncognitive attributes. Hierarchical clustering highlighted differences in how faculty and student raters observed the various noncognitive attributes. Differences in how individual raters associated the various attributes within a station were also observed.
The MFRM and hierarchical clustering helped to explain some of the variability associated with raters in a way that other measurement models are unable to capture. These findings highlight that differences in ratings may result from raters possessing different interpretations of an observed performance. This study has implications for developing more purposeful rater selection and rater profiling in performance-based assessments.
在基于表现的评估中,评分者是未解释且通常不期望出现的差异的重要来源。为了更好地理解评分者差异,本研究调查了不同评分者在观察相同表现时,如何看待在表现评估中测量的不同非认知属性之间的关系。
收集了加拿大一所医学院使用的多重迷你面试(MMI)的医学入学数据,随后使用多面Rasch模型(MFRM)和层次聚类进行分析。这个特定的MMI由八个站点组成。在每个站点,一名教员和一名高年级医学生根据包括沟通、批判性思维、有效性、同理心、正直、成熟度、专业精神和解决问题能力等各种非认知属性对申请者进行评分。
Rasch分析揭示了教员评分者和学生评分者在八个不同MMI站点之间的差异。这些分析还确定,评分者有时无法区分各种非认知属性。层次聚类突出了教员评分者和学生评分者观察各种非认知属性方式的差异。还观察到了单个评分者在一个站点内关联各种属性方式的差异。
MFRM和层次聚类有助于以其他测量模型无法捕捉的方式解释与评分者相关的一些变异性。这些发现突出表明,评分差异可能源于评分者对观察到的表现有不同的解释。本研究对于在基于表现的评估中制定更有针对性的评分者选择和评分者概况分析具有启示意义。