Department of Neurology, University of Toronto, Toronto, Ontario, Canada.
Department of Medical Education, University of Illinois at Chicago College of Medicine, Chicago, Illinois, USA.
BMJ Qual Saf. 2019 Nov;28(11):925-933. doi: 10.1136/bmjqs-2018-008689. Epub 2019 Apr 17.
To develop neurology scenarios for use with the Quality Improvement Knowledge Application Tool Revised (QIKAT-R), gather and evaluate validity evidence, and project the impact of scenario number, rater number and rater type on score reliability.
Six neurological case scenarios were developed. Residents were randomly assigned three scenarios before and after a quality improvement (QI) course in 2015 and 2016. For each scenario, residents crafted an aim statement, selected a measure and proposed a change to address a quality gap. Responses were scored by six faculty raters (two with and four without QI expertise) using the QIKAT-R. Validity evidence from content, response process, internal structure, relations to other variables and consequences was collected. A generalisability (G) study examined sources of score variability, and decision analyses estimated projected reliability for different numbers of raters and scenarios and raters with and without QI expertise.
Raters scored 163 responses from 28 residents. The mean QIKAT-R score was 5.69 (SD 1.06). G-coefficient and Phi-coefficient were 0.65 and 0.60, respectively. Interrater reliability was fair for raters without QI expertise (intraclass correlation = 0.53, 95% CI 0.30 to 0.72) and acceptable for raters with QI expertise (intraclass correlation = 0.66, 95% CI 0.02 to 0.88). Postcourse scores were significantly higher than precourse scores (6.05, SD 1.48 vs 5.22, SD 1.5; p < 0.001). Sufficient reliability for formative assessment (G-coefficient > 0.60) could be achieved by three raters scoring six scenarios or two raters scoring eight scenarios, regardless of rater QI expertise.
Validity evidence was sufficient to support the use of the QIKAT-R with multiple scenarios and raters to assess resident QI knowledge application for formative or low-stakes summative purposes. The results provide practical information for educators to guide implementation decisions.
开发神经科案例,供质量改进知识应用工具修订版(QIKAT-R)使用,收集和评估有效性证据,并预测案例数量、评分者数量和评分者类型对评分可靠性的影响。
开发了 6 个神经科案例。2015 年和 2016 年,住院医师在参加质量改进(QI)课程之前和之后被随机分配了三个案例。对于每个案例,住院医师都会制定一个目标陈述,选择一个测量指标,并提出一项改进措施来解决质量差距。使用 QIKAT-R,由 6 名教师评分者(2 名具有 QI 专业知识,4 名不具有 QI 专业知识)对每个案例进行评分。收集了来自内容、反应过程、内部结构、与其他变量的关系和结果的有效性证据。一项概括力(G)研究检查了评分变化的来源,决策分析估计了不同数量的评分者和案例以及具有和不具有 QI 专业知识的评分者的预测可靠性。
28 名住院医师共完成了 163 次回答。QIKAT-R 的平均得分为 5.69(SD 1.06)。G 系数和 Phi 系数分别为 0.65 和 0.60。不具有 QI 专业知识的评分者的组内相关系数为 0.53(95%CI 0.30 至 0.72),具有 QI 专业知识的评分者的组内相关系数为 0.66(95%CI 0.02 至 0.88),组内相关系数为可接受水平。课程后的得分明显高于课程前的得分(6.05,SD 1.48 与 5.22,SD 1.5;p < 0.001)。通过三名评分者对六个案例进行评分,或两名评分者对八个案例进行评分,无论评分者是否具有 QI 专业知识,都可以获得足够的形成性评估可靠性(G 系数> 0.60)。
足够的有效性证据支持使用 QIKAT-R 进行多案例和评分者评估,以评估住院医师的质量改进知识应用情况,用于形成性或低风险总结性目的。这些结果为教育者提供了实用信息,以指导实施决策。