Boulet John R, Murray David, Kras Joseph, Woodhouse Julie
Foundation for Advancement of International Medical Education and Research, Philadelphia, Pennsylvania 19104, USA.
Simul Healthc. 2008 Summer;3(2):72-81. doi: 10.1097/SIH.0b013e31816e39e2.
In medicine, standard setting methodologies have been developed for both selected-response and performance-based assessments. For simulation-based tasks, research efforts have been directed primarily at assessments that incorporate standardized patients. Mannequin-based evaluations often demand complex, time-sensitive, hierarchically ordered, sequential actions that are difficult to evaluate and score. Moreover, collecting reliable proficiency judgments, necessary to estimate meaningful cut points, can be challenging. The purpose of this investigation was to explore whether expert judgments obtained using an examinee-centered standard setting method that was previously validated for standardized patient-based assessments could be used to set defensible standards for acute-care, mannequin-based scenarios.
Nineteen physicians were recruited to serve as panelists. For each of 12 simulation scenarios, between 8 and 10 performance samples (audio-video recordings), covering the expected ability continuum, were chosen for review. The performance samples were selected from a previously administered evaluation of postgraduate trainees. Based on a consensus definition of readiness to enter unsupervised practice, the panelists made independent judgments of each performance. For each scenario, the association between the panelists' judgments and the assessment scores was summarized and used to estimate a scenario-specific cut score.
For 9 of the scenarios, there was at least a moderately strong relationship between the aggregate panelists' rating and the performance scores, thus allowing for estimation of meaningful numeric standards. For the other 3 scenarios, the aggregate decision rules used by the panelists did not correspond with the achievement measures. For scenarios independently rated by split panels, the estimated cut scores were similar.
An examinee-centered approach, using aggregate expert judgments of audio-video performances, was suitable for setting standards on most acute-care, mannequin-based scenarios. It is necessary, however, to have valid scores for the chosen scenarios and to sample performances across the ability spectrum.
在医学领域,已针对选择题型评估和基于表现的评估开发了标准设定方法。对于基于模拟的任务,研究工作主要针对纳入标准化病人的评估。基于人体模型的评估通常需要复杂、对时间敏感、层次有序的连续动作,这些动作难以评估和评分。此外,收集估计有意义的切点所需的可靠熟练程度判断可能具有挑战性。本研究的目的是探讨使用先前已针对基于标准化病人的评估进行验证的以考生为中心的标准设定方法获得的专家判断,是否可用于为基于人体模型的急性护理场景设定合理的标准。
招募了19名医生作为专家小组成员。对于12个模拟场景中的每一个,选择8至10个涵盖预期能力范围的表现样本(音频视频记录)进行审查。这些表现样本选自先前对研究生学员进行的评估。基于对准备进入无监督实践的共识定义,专家小组成员对每个表现进行独立判断。对于每个场景,总结专家小组成员的判断与评估分数之间的关联,并用于估计特定场景的切点分数。
在9个场景中,专家小组成员的总体评分与表现分数之间至少存在中等强度的关系,从而能够估计有意义的数字标准。对于其他3个场景,专家小组成员使用的总体决策规则与成就衡量标准不相符。对于由不同小组独立评分的场景,估计的切点分数相似。
以考生为中心的方法,即使用对音频视频表现的专家总体判断,适用于为大多数基于人体模型的急性护理场景设定标准。然而,对于所选场景,有必要获得有效的分数,并在能力范围内对表现进行抽样。