Roberts Chris, Shadbolt Narelle, Clark Tyler, Simpson Phillip
Sydney Medical School - Northern, University of Sydney, Hornsby Ku-ring-gai Hospital, Palmerston Road, Sydney, NSW 2077, Australia.
BMC Med Educ. 2014 Sep 20;14:197. doi: 10.1186/1472-6920-14-197.
Little is known about the technical adequacy of portfolios in reporting multiple complex academic and performance-based assessments. We explored, first, the influencing factors on the precision of scoring within a programmatic assessment of student learning outcomes within an integrated clinical placement. Second, the degree to which validity evidence supported interpretation of student scores.
Within generalisability theory, we estimated the contribution that each wanted factor (i.e. student capability) and unwanted factors (e.g. the impact of assessors) made to the variation in portfolio task scores. Relative and absolute standard errors of measurement provided a confidence interval around a pre-determined pass/fail standard for all six tasks. Validity evidence was sought through demonstrating the internal consistency of the portfolio and exploring the relationship of student scores with clinical experience.
The mean portfolio mark for 257 students, across 372 raters, based on six tasks, was 75.56 (SD, 6.68). For a single student on one assessment task, 11% of the variance in scores was due to true differences in student capability. The most significant interaction was context specificity (49%), the tendency for one student to engage with one task and not engage with another task. Rater subjectivity was 29%. An absolute standard error of measurement of 4.74%, gave a 95% CI of +/- 9.30%, and a 68% CI of +/- 4.74% around a pass/fail score of 57%. Construct validity was supported by demonstration of an assessment framework, the internal consistency of the portfolio tasks, and higher scores for students who did the clinical placement later in the academic year.
A portfolio designed as a programmatic assessment of an integrated clinical placement has sufficient evidence of validity to support a specific interpretation of student scores around passing a clinical placement. It has modest precision in assessing students' achievement of a competency standard. There were identifiable areas for reducing measurement error and providing more certainty around decision-making. Reducing the measurement error would require engaging with the student body on the value of the tasks, more focussed academic and clinical supervisor training, and revisiting the rubric of the assessment in the light of feedback.
关于档案袋在报告多种复杂的学术和基于表现的评估方面的技术充分性,人们了解甚少。我们首先探讨了在综合临床实习中对学生学习成果进行的系统性评估中,影响评分准确性的因素。其次,探讨了效度证据支持对学生分数进行解释的程度。
在概化理论中,我们估计了每个期望因素(即学生能力)和非期望因素(如评估者的影响)对档案袋任务分数变异的贡献。测量的相对和绝对标准误差为所有六项任务围绕预先确定的及格/不及格标准提供了一个置信区间。通过证明档案袋的内部一致性并探索学生分数与临床经验的关系来寻求效度证据。
基于六项任务,372名评分者对257名学生的档案袋平均成绩为75.56(标准差为6.68)。对于一名学生在一项评估任务中的情况,分数变异的11%是由于学生能力的真实差异。最显著的交互作用是情境特异性(49%),即一名学生参与一项任务而不参与另一项任务的倾向。评分者主观性为29%。测量的绝对标准误差为4.74%,在及格/不及格分数57%周围给出了95%的置信区间为±9.30%,68%的置信区间为±4.74%。通过展示评估框架、档案袋任务的内部一致性以及学年后期进行临床实习的学生获得更高分数,支持了结构效度。
作为对综合临床实习的系统性评估而设计的档案袋有足够的效度证据来支持围绕通过临床实习对学生分数进行的特定解释。它在评估学生达到能力标准方面的精度适中。存在可识别的领域来减少测量误差并在决策方面提供更多确定性。减少测量误差需要让学生群体了解任务的价值,提供更有针对性的学术和临床督导培训,并根据反馈重新审视评估的评分标准。