Wanderer Jonathan P, de Oliveira Filho Getulio R, Rothman Brian S, Sandberg Warren S, McEvoy Matthew D
From the Departments of Anesthesiology (J.P.W., B.S.R., W.S.S., M.D.M.) and Biomedical Informatics (J.P.W.), Vanderbilt University Medical Center, Nashville, Tennessee; and Department of Surgery, Federal University of Santa Catarina, Florianópolis, Brazil (G.R.d.O.F.).
Anesthesiology. 2018 Jan;128(1):144-158. doi: 10.1097/ALN.0000000000001919.
Assessment of clinical competence is essential for residency programs and should be guided by valid, reliable measurements. We implemented Baker's Z-score system, which produces measures of traditional core competency assessments and clinical performance summative scores. Our goal was to validate use of summative scores and estimate the number of evaluations needed for reliable measures.
We performed generalizability studies to estimate the variance components of raw and Z-transformed absolute and peer-relative scores and decision studies to estimate the evaluations needed to produce at least 90% reliable measures for classification and for high-stakes decisions. A subset of evaluations was selected representing residents who were evaluated frequently by faculty who provided the majority of evaluations. Variance components were estimated using ANOVA.
Principal component extraction from 8,754 complete evaluations demonstrated that a single factor explained 91 and 85% of variance for absolute and peer-relative scores, respectively. In total, 1,200 evaluations were selected for generalizability and decision studies. The major variance component for all scores was resident interaction with measurement occasions. Variance due to the resident component was strongest with raw scores, where 30 evaluation occasions produced 90% reliable measurements with absolute scores and 58 for peer-relative scores. For Z-transformed scores, 57 evaluation occasions produced 90% reliable measurements with absolute scores and 55 for peer-relative scores. The results were similar for high-stakes decisions.
The Baker system produced moderately reliable measures at our institution, suggesting that it may be generalizable to other training programs. Raw absolute scores required few assessment occasions to achieve 90% reliable measurements.
临床能力评估对于住院医师培训项目至关重要,且应以有效、可靠的测量为指导。我们实施了贝克Z分数系统,该系统可生成传统核心能力评估的测量值和临床绩效总结分数。我们的目标是验证总结分数的使用,并估计获得可靠测量所需的评估次数。
我们进行了概化性研究,以估计原始分数和Z转换后的绝对分数及相对同龄人分数的方差成分,并进行了决策研究,以估计为分类和高风险决策生成至少90%可靠测量所需的评估次数。选择了一部分评估作为子集,这些评估代表了那些被提供大部分评估的教员频繁评估的住院医师。使用方差分析估计方差成分。
从8754份完整评估中提取主成分表明,单一因素分别解释了绝对分数和相对同龄人分数方差的91%和85%。总共选择了1200份评估用于概化性和决策研究。所有分数的主要方差成分是住院医师与测量场合的交互作用。原始分数中,住院医师成分导致的方差最强,30次评估场合可产生90%可靠的绝对分数测量,58次评估场合可产生90%可靠的相对同龄人分数测量。对于Z转换分数,57次评估场合可产生90%可靠的绝对分数测量,55次评估场合可产生90%可靠的相对同龄人分数测量。高风险决策的结果相似。
贝克系统在我们机构产生了适度可靠的测量,表明它可能适用于其他培训项目。原始绝对分数只需很少的评估场合就能实现9