Ward Krista, Kinney Kathy, Patania Rhina, Savage Linda, Motley Jamie, Smith Monica
J Chiropr Educ. 2019 Oct;33(2):140-144. doi: 10.7899/JCE-18-9. Epub 2019 Mar 27.
Clinical competency is integral to the doctor of chiropractic program and is dictated by the Council of Chiropractic Education accreditation standards. These meta-competencies, achieved through open-ended tasks, can be challenging for interrater agreement among multiple graders. We developed and tested interrater agreement of a newly created analytic rubric for a clinical case-based education program.
Clinical educators and research staff collaborated on rubric development and testing over four phases. Phase 1 tailored existing institutional rubrics to the new clinical case-based program using a 4-level scale of proficiency. Phase 2 tested the performance of the pilot rubric using 16 senior intern assessments graded by four instructors using pre-established grading keys. Phases 3 and 4 refined and retested rubric versions 1 and 2 on 16 and 14 assessments, respectively.
Exact, adjacent, and pass/fail agreements between six pairs of graders were reported. The pilot rubric achieved 46% average exact, 80% average adjacent, and 63% pass/fail agreements. Rubric version 1 yielded 49% average exact, 86% average adjacent, and 70% pass/fail agreements. Rubric version 2 yielded 60% average exact, 93% average adjacent, and 81% pass/fail agreements.
Our results are similar to those of other rubric interrater reliability studies. Interrater reliability improved with later versions of the rubric likely attributable to rater learning and rubric improvement. Future studies should focus on concurrent validity and comparison of student performance with grade point average and national board scores.
临床能力是整脊疗法博士课程的组成部分,由整脊疗法教育委员会的认证标准规定。这些通过开放式任务实现的元能力,对于多个评分者之间的评分者间一致性而言可能具有挑战性。我们开发并测试了一种新创建的用于基于临床案例的教育计划的分析性评分量表的评分者间一致性。
临床教育工作者和研究人员在四个阶段共同进行评分量表的开发和测试。第1阶段使用4级熟练程度量表,将现有的机构评分量表调整为适用于新的基于临床案例的课程。第2阶段使用预先确定的评分标准,由四名教师对16名高级实习生的评估进行评分,以测试试点评分量表的性能。第3阶段和第4阶段分别在16次和14次评估中对评分量表版本1和版本2进行完善和重新测试。
报告了六对评分者之间的完全、相邻和通过/失败一致性。试点评分量表的平均完全一致性为46%,平均相邻一致性为80%,通过/失败一致性为63%。评分量表版本1的平均完全一致性为49%,平均相邻一致性为86%,通过/失败一致性为70%。评分量表版本2的平均完全一致性为60%,平均相邻一致性为93%,通过/失败一致性为81%。
我们的结果与其他评分量表评分者间信度研究的结果相似。评分者间信度随着评分量表的后期版本而提高,这可能归因于评分者的学习和评分量表的改进。未来的研究应侧重于同时效度以及学生成绩与平均绩点和国家委员会分数的比较。