Department of Anesthesia and Perioperative, Medical University of South Carolina, Charleston, SC 29425, USA.
Simul Healthc. 2012 Aug;7(4):222-35. doi: 10.1097/SIH.0b013e3182590b07.
Defining valid, reliable, defensible, and generalizable standards for the evaluation of learner performance is a key issue in assessing both baseline competence and mastery in medical education. However, before setting these standards of performance, the reliability of the scores yielding from a grading tool must be assessed. Accordingly, the purpose of this study was to assess the reliability of scores generated from a set of grading checklists used by nonexpert raters during simulations of American Heart Association (AHA) Megacodes.
The reliability of scores generated from a detailed set of checklists, when used by 4 nonexpert raters, was tested by grading team leader performance in 8 Megacode scenarios. Videos of the scenarios were reviewed and rated by trained faculty facilitators and a group of nonexpert raters. The videos were reviewed "continuously" and "with pauses." The grading made by 2 content experts served as the reference standard, and 4 nonexpert raters were used to test the reliability of the checklists.
Our results demonstrate that nonexpert raters are able to produce reliable grades when using the checklists under consideration, demonstrating excellent intrarater reliability and agreement with a reference standard. The results also demonstrate that nonexpert raters can be trained in the proper use of the checklist in a short amount of time, with no discernible learning curve thereafter. Finally, our results show that a single trained rater can achieve reliable scores of team leader performance during AHA Megacodes when using our checklist in a continuous mode because measures of agreement in total scoring were very strong [Lin's (Biometrics 1989;45:255-268) concordance correlation coefficient, 0.96; intraclass correlation coefficient, 0.97].
We have shown that our checklists can yield reliable scores, are appropriate for use by nonexpert raters, and are able to be used during continuous assessment of team leader performance during the review of a simulated Megacode. This checklist may be more appropriate for use by advanced cardiac life support instructors during Megacode assessments than the current tools provided by the AHA.
为评估学习者的表现定义有效、可靠、有辩护余地和可推广的标准是评估医学教育中基本能力和掌握程度的关键问题。然而,在设定这些绩效标准之前,必须评估评分工具得出的分数的可靠性。因此,本研究的目的是评估一组非专业评分者在模拟美国心脏协会 (AHA) 大型代码时使用的评分检查表产生的分数的可靠性。
通过对 8 个大型代码场景中的团队领导绩效进行评分,测试了由 4 名非专业评分者使用详细检查表集产生的分数的可靠性。对场景的视频进行了审查和评分,评分者包括经过培训的教师促进者和一组非专业评分者。视频是“连续”和“暂停”审查的。由 2 名内容专家进行的评分作为参考标准,使用 4 名非专业评分者来测试检查表的可靠性。
我们的结果表明,非专业评分者在使用所考虑的检查表时能够产生可靠的成绩,表现出极好的内部评分者可靠性和与参考标准的一致性。结果还表明,非专业评分者可以在短时间内接受检查表使用的适当培训,此后没有明显的学习曲线。最后,我们的结果表明,当使用我们的检查表以连续模式进行 AHA 大型代码时,一名受过培训的评分者可以获得团队领导表现的可靠成绩,因为总评分的一致性度量非常强[Lin 的(Biometrics 1989;45:255-268)一致性相关系数,0.96;组内相关系数,0.97]。
我们已经证明,我们的检查表可以产生可靠的成绩,适合非专业评分者使用,并且能够在模拟大型代码审查期间对团队领导表现进行连续评估时使用。与 AHA 目前提供的工具相比,该检查表可能更适合高级心脏生命支持指导员在大型代码评估中使用。