Department of Cardiology, The Canberra Hospital, Garran, ACT 2605, Australia.
Adv Health Sci Educ Theory Pract. 2011 Aug;16(3):405-25. doi: 10.1007/s10459-011-9296-1. Epub 2011 May 24.
Even though rater-based judgements of clinical competence are widely used, they are context sensitive and vary between individuals and institutions. To deal adequately with rater-judgement unreliability, evaluating the reliability of workplace rater-based assessments in the local context is essential. Using such an approach, the primary intention of this study was to identify the trainee score variation around supervisor ratings, identify sampling number needs of workplace assessments for certification of competence and position the findings within the known literature. This reliability study of workplace-based supervisors' assessments of trainees has a rater-nested-within-trainee design. Score variation attributable to the trainee for each competency item assessed (variance component) were estimated by the minimum-norm quadratic unbiased estimator. Score variance was used to estimate the number needed for a reliability value of 0.80. The trainee score variance for each of 14 competency items varied between 2.3% for emergency skills to 35.6% for communication skills, with an average for all competency items of 20.3%; the "Overall rating" competency item trainee variance was 28.8%. These variance components translated into 169, 7, 17 and 28 assessments needed for a reliability of 0.80, respectively. Most variation in assessment scores was due to measurement error, ranging from 97.7% for emergency skills to 63.4% for communication skills. Similar results have been demonstrated in previously published studies. In summary, overall supervisors' workplace based assessments have poor reliability and are not suitable for use in certification processes in their current form. The marked variation in the supervisors' reliability in assessing different competencies indicates that supervisors may be able to assess some with acceptable reproducibility; in this case communication and possibly overall competence. However, any continued use of this format for assessment of trainee competencies necessitates the identification of what supervisors in different institutions can reliably assess rather than continuing to impose false expectations from unreliable assessments.
尽管基于评估者的临床能力判断被广泛应用,但它们受到背景的影响,并且在个体和机构之间存在差异。为了充分处理评估者判断的不可靠性,在当地背景下评估工作场所基于评估者的评估的可靠性至关重要。采用这种方法,本研究的主要目的是确定围绕主管评估的学员分数变化,确定工作场所评估进行认证所需的样本数量,并在已知文献中定位研究结果。这项针对工作场所主管对学员评估的可靠性研究采用了评估者嵌套于学员的设计。通过最小范数二次无偏估计量,估计了每个评估项目(方差分量)归因于学员的分数变化。使用分数方差来估计达到 0.80 可靠性值所需的数量。14 项能力项目中每个学员的分数方差从急救技能的 2.3%到沟通技能的 35.6%不等,所有能力项目的平均分数方差为 20.3%;“整体评估”能力项目学员方差为 28.8%。这些方差分量分别转化为 169、7、17 和 28 次评估,以达到 0.80 的可靠性。评估分数的大部分变化是由于测量误差,从急救技能的 97.7%到沟通技能的 63.4%不等。先前发表的研究也得出了类似的结果。总之,整体主管的工作场所基于评估的可靠性较差,不适合在当前形式下用于认证过程。主管在评估不同能力时可靠性的显著差异表明,主管可能能够以可接受的可重复性评估某些能力;在这种情况下是沟通能力,可能还有整体能力。然而,要继续使用这种评估学员能力的格式,就需要确定不同机构的主管能够可靠评估哪些能力,而不是继续对不可靠的评估产生不切实际的期望。