Faculty of Medicine, McGill University, Montreal, Quebec, Canada; Departments of Surgery and Oncology, McGill University Health Center, Montreal, Quebec, Canada.
Department of Educational and Counselling Psychology, McGill University, Montreal, Quebec, Canada.
J Surg Educ. 2018 May-Jun;75(3):779-786. doi: 10.1016/j.jsurg.2017.08.024. Epub 2017 Sep 18.
Undergraduate medical students at a large academic trauma center are required to manage a series of online virtual trauma patients as a mandatory exercise during their surgical rotation.
Clinical reasoning during undergraduate medical education can be difficult to assess. The purpose of the study was to determine whether we could use components of the students' virtual patient management to measure changes in their clinical reasoning over the course of the clerkship year. In order to accomplish this, we decided to determine if the use of scoring rubrics could change the traditional subjective assessment to a more objective evaluation.
Two groups of students, one at the beginning of clerkship (Juniors) and one at the end of clerkship (Seniors), were chosen. Each group was given the same virtual patient case, a clinical scenario based on the Advanced Trauma Life Support (ATLS) Primary Trauma Survey, which had to be completed during their trauma rotation. The learner was required to make several key patient management choices based on their clinical reasoning, which would take them along different routes through the case. At the end of the case they had to create a summary report akin to sign-off. These summaries were graded independently by two domain "Experts" using a traditional subjective surgical approach to assessment and by two "Non-Experts" using two internally validated scoring rubrics. One rubric assessed procedural or domain knowledge (Procedural Rubric), while the other rubric highlighted semantic qualifiers (Semantic Rubric). Each of the rubrics was designed to reflect established components of clinical reasoning. Student's t-tests were used to compare the rubric scores for the two groups and Cohen's d was used to determine effect size. Kendall's τ was used to compare the difference between the two groups based on the "Expert's" subjective assessment. Inter-rater reliability (IRR) was determined using Cronbach's alpha.
The Seniors did better than the Juniors with respect to "Procedural" issues but not for "Semantic" issues using the rubrics as assessed by the "Non-Experts". The average Procedural rubric score for the Senior group was 59% ± 13% while for the junior group, it was 51% ± 12% (t= 2.715; p = 0.008; Cohen's d = 1.53). The average Semantic rubric score for the Senior group was 31% ± 15% while for the Junior group, it was 28% ± 14% (t = 1.010; p = .316, ns). There was no statistical difference in the marks given to the Senior versus Junior groups by the "Experts" (Kendall's τ = 0.182, p = 0.07). The IRR between the "Non-Experts" using the rubrics was higher than the IRR of the "Experts" using the traditional surgical approach to assessment. The Cronbach's alpha for the Procedural and Semantic rubrics was 0.94 and 0.97, respectively, indicating very high IRR. The correlation between the Procedural rubric scores and "Experts" assessment was approximately r = 0.78, and that between the Semantic rubric and the "Experts" assessment was roughly r = 0.66, indicating high concurrent validity for the Procedural rubric and moderately high validity for the Semantic rubric.
Clinical reasoning, as measured by some of its "procedural" features, improves over the course of the clerkship year. Rubrics can be created to objectively assess the summary statement of an online interactive trauma VP for "procedural" issues but not for "semantic" issues. Using IRR as a measure, the quality of assessment is improved using the rubrics. The "Procedural" rubric appears to measure changes in clinical reasoning over the course of 3rd-year undergraduate clinical studies.
在一所大型学术创伤中心,医学生在外科轮转期间需要管理一系列在线虚拟创伤患者,作为强制性练习。
在本科医学教育中进行临床推理评估可能具有挑战性。本研究的目的是确定我们是否可以使用学生虚拟患者管理的某些组成部分来衡量整个实习年度临床推理的变化。为了实现这一目标,我们决定确定使用评分细则是否可以将传统的主观评估转变为更客观的评估。
选择了两组学生,一组在实习开始时(大三),一组在实习结束时(大四)。每组都收到了相同的虚拟患者病例,这是一个基于高级创伤生命支持(ATLS)初步创伤调查的临床情况,他们必须在创伤轮转期间完成。学习者需要根据他们的临床推理做出几个关键的患者管理选择,这将使他们沿着不同的路线进入病例。在病例结束时,他们必须创建类似于签字的总结报告。这些总结由两位领域“专家”使用传统的主观手术方法进行独立评分,由两位使用内部验证评分细则的“非专家”进行评分。一个细则评估程序或领域知识(程序细则),而另一个细则突出语义限定词(语义细则)。每个细则旨在反映临床推理的既定组成部分。学生的 t 检验用于比较两组的细则得分,Cohen's d 用于确定效应大小。Kendall's τ用于比较基于“专家”主观评估的两组之间的差异。使用 Cronbach 的 alpha 确定组内信度(IRR)。
在使用细则由“非专家”进行评估时,高年级学生在“程序”问题上的表现优于低年级学生,但在“语义”问题上并非如此。高级组的平均程序细则得分是 59%±13%,而低年级组的平均得分是 51%±12%(t=2.715;p=0.008;Cohen's d=1.53)。高级组的平均语义细则得分是 31%±15%,而低年级组的平均得分是 28%±14%(t=1.010;p=0.316,无统计学意义)。“专家”对高年级和低年级学生的评分没有统计学差异(Kendall's τ=0.182,p=0.07)。使用细则的“非专家”之间的组内信度高于使用传统手术方法评估的“专家”之间的组内信度。程序细则和语义细则的 Cronbach's alpha 分别为 0.94 和 0.97,表明组内信度非常高。程序细则得分与“专家”评估之间的相关性约为 r=0.78,语义细则与“专家”评估之间的相关性约为 r=0.66,表明程序细则具有较高的同时效度,语义细则具有较高的中等效度。
临床推理,如通过其一些“程序”特征来衡量,在实习年度内有所提高。可以创建细则来客观评估在线互动创伤 VP 的总结陈述,以解决“程序”问题,但不能解决“语义”问题。使用 IRR 作为衡量标准,使用细则可以提高评估质量。“程序”细则似乎可以衡量 3 年级本科临床研究期间临床推理的变化。